Researchers at University College London and the Alan Turing Institute found they could correctly identify a Twitter user from a group of 10,000 with 96.7 percent accuracy, using just their tweets and publicly available metadata.
The goal was “to determine if the information contained in users’ metadata is sufficient to fingerprint an account,” and the results reveal how much identifying information is tied to Twitter accounts, whose users may believe they are tweeting anonymously. A single tweet contains about 144 fields of metadata.
“That’s the mentality with metadata,” the study’s lead co-author Beatrice Perez of University College London told Wired.“People think it’s not a big deal.”
Researchers took 14 pieces of metadata from 5 million Twitter accounts – including the date the account was created, its followers, the accounts it follows and the tweets it likes – and ran it through three machine-learning algorithms. The researchers found the most basic algorithm had the most accuracy.
The methods of identifying users could be used if an account changes its name, if a user has created multiple accounts or to tell if legitimate accounts have been taken over by malicious users.
The researchers also found obfuscation strategies are ineffective, as even when 60 percent of the data was muddled or altered, the user was able to be classified with an accuracy of more than 95 percent.
When the researchers widened their scope and searched for the 10 most likely candidates, their accuracy was 99.22 percent.
While the study uses Twitter as its subject, its authors note “the methods presented in this work are generic and can be applied to a variety of social media platforms with similar characteristics in terms of metadata.”
The researchers say the results have strong implications in terms of “the design of metadata obfuscation strategies” not just for Twitter, but for most social media platforms.
More about: Twitter