AOL Proudly Releases Massive Amounts of Private Data
AOL must have missed the uproar over the DOJ’s demand for “anonymized” search data last year that caused all sorts of pain for Mcft and Google. That’s the only way to explain their release of data that includes 20 million web queries from 650,000 AOL users.
The data includes all searches from those users for a three month period this year, as well as whether they clicked on a result, what that result was and where it appeared on the result page. It’s a 439 MB compressed download, expanded to just over 2 gigs. The data is available here (SEE BELOW) and the output is in ten text files, tab delineated.
The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
HERE IT IS - Lots of mirrors - 439mb file - Enjoy!
This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.
If you don't understand the implications of this file and what you can do with it, don't bother with it. For those of you who are into SEO and running a profitable website, you'll find a lot of useful information here.
AOL must have missed the uproar over the DOJ’s demand for “anonymized” search data last year that caused all sorts of pain for Mcft and Google. That’s the only way to explain their release of data that includes 20 million web queries from 650,000 AOL users.
The data includes all searches from those users for a three month period this year, as well as whether they clicked on a result, what that result was and where it appeared on the result page. It’s a 439 MB compressed download, expanded to just over 2 gigs. The data is available here (SEE BELOW) and the output is in ten text files, tab delineated.
The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
HERE IT IS - Lots of mirrors - 439mb file - Enjoy!
This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.
If you don't understand the implications of this file and what you can do with it, don't bother with it. For those of you who are into SEO and running a profitable website, you'll find a lot of useful information here.













