Facebook Spider
Thursday, January 26th, 2006I’ve written a Perl program to spider Facebook. I was looking for a way to quickly generate statistics about the University of California, Berkeley student population, and I figured that since almost everybody had a Facebook account, I could dump all of Facebook’s information into a database and generate reports from that information. Since this program has proven useful, I’ve decided to release it to the general public.
How It Works
If you’re unfamiliar with the term spider, I recommend that you read the Wikipedia page on web spiders for a thorough discussion of how a spider works. In a nutshell, my program goes to a Facebook user’s profile, scans their friends list for other profiles, visits each of their profiles, scanning their friends list, and so on. Along the way, my program also scans a user’s profile for information, parses it, and inserts it into a SQL database.
Features
I’m only aware of one other Facebook spider: a Perl script written by Michael Kelly. However, Michael’s script only collects information about user’s friends. My script captures all the information available in a user’s profile (except for the ‘About Me’ field). Furthermore, my script provides the following enhancements:
- Multi-threaded support. Each user’s profile is processed in its own thread. The total number of threads can be set using a command-line parameter, and the program uses semaphores to enforce the maximum number of threads.
- SQL database storage. My script stores user information in a SQL database ordered by Facebook UID. I’ve used relatively simple queries throughout the script, so any SQL database should be supported (i.e., MySQL and PostgreSQL should work). However, I’ve chosen SQLite3 as the default database. If you wish to use another database type, install the appropriate DBD driver and modify the database handle line to use that driver.
- Easy data processing. Since all data is stored in a SQL database, it should be relatively easy to write programs that query the database for information.
- Sleep between threads. It’s possible to provide a value, in seconds, that my script should wait before spawning a new thread. This should prevent the script from overloading the Facebook servers.
Quick Start
Assuming you have all the necessary Perl modules and sqlite3 installed:
- Create a SQLite3 database:
$ sqlite3 database.db ‘CREATE TABLE userdata ( uid integer, name, friends, school, status, sex, concentration, residence, hometown, highschool, screenname, mobile, website, lookingfor, interestedin, relationshipstatus, politicalviews, interests, clubsjobs, favoritemusic, favoritemovies, favoritebooks );’ - Create a facebook.conf:
$ cp facebook.conf.sample facebook.conf
$ vim facebook.conf - Start the script:
$ ./facebook.pl -t 2 -s 10 -f database.db [SOME FACEBOOK UID]
I Want It!
The script has been removed at Facebook’s request.
Notes
I haven’t tested the script lately, but it should still work. If it doesn’t, post a comment, and I’ll release an update.
Since my script parses the HTML returned from Facebook, if Facebook makes any changes to their profile layouts, I’ll have to make major modifications to the code.
Future
I’m in the process of designing an interface to Facebook that resembles Google Maps. Users will be able to interactively visualize their friend network, and clicking a user’s “node” should bring up their Facebook profile in a new window. More details will be forthcoming.


