Cambridge Analytica had profile information for some 50 million Facebook users, according to reports.
Now we know what prompted Facebook to suspend Cambridge Analytica, the data analytics firm the Trump campaign used during the 2016 election: The company was trying to get ahead of big stories about Cambridge in both The New York Times and the Observer.
Both stories hit Saturday morning, and claim that Cambridge Analytica had amassed a data trove with information from more than 50 million Facebook users it collected without their permission.
That’s a much larger number than Facebook reported last night, when it said that just 270,000 people “gave their consent” to hand over data to a third party researcher and University of Cambridge professor named Dr. Aleksandr Kogan.
How does that work? Back in 2015, Kogan, who also worked at a company called Global Science Research, created an app called “thisisyourdigitallife,” which used Facebook’s login feature that lets people join a third party app with their Facebook account, instead of creating a new app-specific account. Some 270,000 people logged into the app that way, granting Kogan permission under Facebook’s rules to scrape some of their profile data, including their identity and things that they’ve “liked.”
But that permission also gave Kogan access to data about the friend networks of these 270,000 people, which amounted to tens of millions of Facebook users, according to The Times. Kogan then shared that data with Cambridge Analytica, which was “building psychographic profiles” on American voters in order to target them with ads.
Here’s a key graph from the Times’s story:
“[Kogan] ultimately provided over 50 million raw profiles to the firm, Mr. Wylie said, a number confirmed by a company email and a former colleague. Of those, roughly 30 million contained enough information, including places of residence, that the company could match users to other records and build psychographic profiles. Only about 270,000 users — those who participated in the survey — had consented to having their data harvested.”
Kogan and Cambridge Analytica both certified to Facebook that it had destroyed this data back in 2015, but “copies of the data still remain beyond Facebook’s control,” The New York Times is reporting.
Cambridge Analytica claims that the data has been deleted, and that it had no idea it was collected in ways that violated Facebook’s terms of service.
“When it subsequently became clear that the data had not been obtained by GSR in line with Facebook’s terms of service, Cambridge Analytica deleted all data received from GSR,” a company spokesperson said in a statement sent to Recode. “We worked with Facebook over this period to ensure that they were satisfied that we had not knowingly breached any of Facebook’s terms of service and also provided a signed statement to confirm that all Facebook data and their derivatives had been deleted.”
“No data from GSR was used by Cambridge Analytica as part of the services it provided to the Donald Trump 2016 presidential campaign,” the statement added.
Facebook, for its part, is adamant that the company did nothing wrong — the data was collected appropriately under its terms of service, it was then abused by the collector. Facebook’s Chief Security Officer Alex Stamos said it bluntly on Twitter Saturday morning: “[Kogan] lied to those users and he lied to Facebook about what he was using the data for.”
The researcher in question, Aleksandr Kogan, enticed several hundred thousand individuals to use Facebook to login to his personality quiz in 2014. He lied to those users and he lied to Facebook about what he was using the data for.
— Alex Stamos (@alexstamos) March 17, 2018
It’s an illuminating look at how Cambridge Analytica and the Trump campaign “won” Facebook during the campaign — Trump’s Facebook strategy has been identified as a key factor in his surprising victory.
But the stories also leave a number of unanswered questions:
- How helpful was the data in targeting U.S. voters? How much of a difference did it make?
- Will Facebook change its policies to further limit the data that third parties can collect from its users?
- How much of the data is still out there online, and is it being used by the Trump campaign today?