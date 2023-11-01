A recent study conducted researchers at ETH Zurich in Switzerland has shed light on the extent to which personal information can be inferred from our online activities. They found that large language models (LLMs) such as GPT-4 are able to accurately identify a person’s age, location, gender, and even income with up to 85% accuracy analyzing their social media posts.

To conduct the study, Robin Staab and Mark Vero had nine LLMs examine a database of Reddit posts and extract identifying information based on the way users wrote. From a pool of 1,500 profiles, they narrowed down their selection to 520 users for whom they could confidently determine attributes like place of birth, income bracket, gender, and location.

The results revealed that some LLMs had an impressive level of accuracy in identifying these attributes. GPT-4, in particular, achieved the highest overall accuracy at 85%, while a lower-powered LLM called LlaMA-2-7b had the least accuracy at 51%.

Staab emphasizes that this study highlights how much personal information we unknowingly disclose online. It’s common for users to overlook the fact that their age or location can be directly inferred from their writing style, but LLMs have proven to be quite capable in this regard.

Interestingly, the LLMs not only picked up on explicitly stated personal details in the posts, such as income mentioned in financial advice forums, but also on subtler cues like location-specific slang. By considering a user’s profession and location, they could even estimate a salary range.

However, the study revealed that some characteristics were easier for the LLMs to discern than others. GPT-4, for example, was 97.8% accurate in guessing gender but only 62.5% accurate in predicting income.

As Alan Woodward from the University of Surrey notes, we are only scratching the surface when it comes to understanding how privacy is impacted the use of LLMs. With the increasing prevalence of AI-powered language models, it is crucial for individuals to be mindful of the personal information they inadvertently reveal online.

