Tue

Oct
13th

Data mining example. How to figure out how many users Digg has by scraping some info.

John Graham-Cummings data mining of digg.com

Image courtesy of John's website, original link is here.

As I've been reading and working my way through Ben Fry's book Visualizing Data I'm starting to understand how simple it is to collect some really interesting data. One instance that comes to mind is John Graham-Cummings exploration of how many users digg.com has.

I obtained this number by finding random Digg users and extracting their user id. The user id is in a hidden HTML form input field on each Digg user's page. The Digg user page also gives their date of registration. Using this I was able to plot every month from December 2004 (when Kevin Rose registered) up to this month.

What I haven't been able to discern is if he did this by hand or by code. I would assume by code in order to get a reasonable large enough sample but then again if digg is indeed auto-incrementing their user id's then perhaps by hand would be enough. Regardless, I would be curious as to what a script to scrape this kind of data would look like.