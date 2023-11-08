A recent LinkedIn data leak has shed light on the significant issue of fake and inflated data in the digital underground. The data, which was offered for sale criminals on the dark web, was initially claimed to compromise nearly 20 million accounts. However, upon further analysis Troy Hunt, the operator of the Have I Been Pwned service, it was revealed that the data was largely outdated and artificially constructed.

Hunt’s investigation revealed that the data was not obtained through a breach of LinkedIn’s security systems, but rather through a technique known as scraping. Scraping involves using bots and scripts to extract publicly available data from platforms like LinkedIn. This method is still commonly used cybercriminals to acquire user data. Similar to the recent Duolingo data leak, where 2.6 million user records were exposed, scraping was utilized to extract the compromised LinkedIn data.

The allegedly leaked LinkedIn dataset initially consisted of 2.5 million entries, which were a combination of publicly available LinkedIn profile data and 5.8 million email addresses generated from the combination of first and last names. Hunt discovered inconsistencies within the dataset when he noticed multiple email aliases for a single profile. This pattern resulted in an expansion from 2.5 million to 5.8 million accounts.

Upon further examination, it became apparent that the email addresses in the dataset were constructed and falsified. They followed a specific pattern, where the domain of the email address matched the employer’s actual domain, and the email alias was a combination of the individual’s first and last name. This pattern of data manipulation was consistent across all profiles with multiple email addresses.

It is worth noting that the dataset also included headers indicating the presence of other sources such as Salesforce, Spendesk, and Hubspot, suggesting that it may be a compilation from various sources rather than a single scraping operation on LinkedIn.

While the motive behind inflating the dataset remains unclear, Hunt speculates that it could be driven profit or a pursuit of notoriety. Regardless, the LinkedIn data leak serves as a sobering reminder of the prevalence of fake and artificially constructed data in the digital landscape.

FAQ

What is scraping?

Scraping is a technique used to extract data from websites using bots or scripts to retrieve publicly available information.

What is a data leak?

A data leak refers to the unauthorized access or disclosure of sensitive information. In the context of the LinkedIn data leak, it refers to the exposure of user data that was not intended to be publicly accessible.

What measures can users take to protect their data?

Users can protect their data regularly updating their passwords, enabling two-factor authentication, and being cautious about sharing personal information online. It is also essential to monitor for any suspicious activity on online accounts and utilize reputable security tools and services.