Archive for the ‘Open-Source’ Category

Facebook Spider

Thursday, January 26th, 2006

I’ve written a Perl program to spider Facebook. I was looking for a way to quickly generate statistics about the University of California, Berkeley student population, and I figured that since almost everybody had a Facebook account, I could dump all of Facebook’s information into a database and generate reports from that information. Since this program has proven useful, I’ve decided to release it to the general public.

How It Works

If you’re unfamiliar with the term spider, I recommend that you read the Wikipedia page on web spiders for a thorough discussion of how a spider works. In a nutshell, my program goes to a Facebook user’s profile, scans their friends list for other profiles, visits each of their profiles, scanning their friends list, and so on. Along the way, my program also scans a user’s profile for information, parses it, and inserts it into a SQL database.

Features

I’m only aware of one other Facebook spider: a Perl script written by Michael Kelly. However, Michael’s script only collects information about user’s friends. My script captures all the information available in a user’s profile (except for the ‘About Me’ field). Furthermore, my script provides the following enhancements:

  • Multi-threaded support. Each user’s profile is processed in its own thread. The total number of threads can be set using a command-line parameter, and the program uses semaphores to enforce the maximum number of threads.
  • SQL database storage. My script stores user information in a SQL database ordered by Facebook UID. I’ve used relatively simple queries throughout the script, so any SQL database should be supported (i.e., MySQL and PostgreSQL should work). However, I’ve chosen SQLite3 as the default database. If you wish to use another database type, install the appropriate DBD driver and modify the database handle line to use that driver.
  • Easy data processing. Since all data is stored in a SQL database, it should be relatively easy to write programs that query the database for information.
  • Sleep between threads. It’s possible to provide a value, in seconds, that my script should wait before spawning a new thread. This should prevent the script from overloading the Facebook servers.

Quick Start

Assuming you have all the necessary Perl modules and sqlite3 installed:

  1. Create a SQLite3 database:
    $ sqlite3 database.db ‘CREATE TABLE userdata ( uid integer, name, friends, school, status, sex, concentration, residence, hometown, highschool, screenname, mobile, website, lookingfor, interestedin, relationshipstatus, politicalviews, interests, clubsjobs, favoritemusic, favoritemovies, favoritebooks );’
  2. Create a facebook.conf:
    $ cp facebook.conf.sample facebook.conf
    $ vim facebook.conf
  3. Start the script:
    $ ./facebook.pl -t 2 -s 10 -f database.db [SOME FACEBOOK UID]

I Want It!

The script has been removed at Facebook’s request.

Notes

I haven’t tested the script lately, but it should still work. If it doesn’t, post a comment, and I’ll release an update.

Since my script parses the HTML returned from Facebook, if Facebook makes any changes to their profile layouts, I’ll have to make major modifications to the code.

Future

I’m in the process of designing an interface to Facebook that resembles Google Maps. Users will be able to interactively visualize their friend network, and clicking a user’s “node” should bring up their Facebook profile in a new window. More details will be forthcoming.

Richard M. Stallman and Chris DiBona Pictures

Saturday, October 29th, 2005

I’ve uploaded pictures taken by CalLUG (UC Berkeley GNU/Linux Users Group) staff during the Richard M. Stallman and Chris DiBona speeches. Please click on either of the following thumbnails to browse the complete album.


Richard M. Stallman Speech


Chris DiBona Speech

Back in Business

Thursday, October 20th, 2005

Now that I’ve survived my first wave of midterms, I plan on devoting more time to my extracurricular activities.

I’ve noticed that my daily del.icio.us script for Wordpress is broken, so I’ll be updating it to work again, and I’ll be finally addressing the nagging issue of time sync in that script. I also just finished setting up a Xen server for the System Administration for the Web class, so I’ll be sharing my experiences building a Debian Xen server via a comprehensive Xen 2.0.7 HOWTO — one cannot begin to describe the lack of documentation for Xen. Lastly, I’ll be updating some projects that are currently off-limits to the public. I’ll be releasing more information about them when they reach a usable state.

Chris DiBona Speech

Friday, October 14th, 2005

The UC Berkeley Linux Users Group will be hosting a speech by Chris DiBona, Open Source Program Manager for Google and a former editor of Slashdot, on October 17 from 6-8 PM at 306 Soda. As with all CalLUG events, the speech is open to the public and there will be no cost to attend.

If you’re interested in Linux and learning about the latest Google technology, be sure to come!

Bad Meets Evil

Tuesday, May 3rd, 2005


Penguin Meets A Huge Leech, originally uploaded by zeroion.

Perhaps the penguin represents Linux and the leech represents Microsoft, who has been “leeching” ideas from Apple and Linux. Unfortunately I didn’t have any apples in my fridge…

Server Tweaking

Monday, May 2nd, 2005

I’ve just finished some intensive tweaking of the Cold Ray Hosting servers. I was able to reduce the memory used by Apache, our web daemon, and MySQL, our database daemon. By reducing the memory used by each daemon, I was able to release memory to other active processes.

The immediate benefit of my tweaks is a more responsive server, as evidenced by the faster loading of this blog. Furthermore, I’m confident that the optimizations will result in greater server stability and uptime.

coldray.com’s wiki

Sunday, April 24th, 2005

Now that all my midterms are over, I’ve been able to dedicate some time to working on my web hosting company, coldray.com. Our system has been up for a few months, and a user-friendly control panel is now undergoing testing. This weekend, I’ve added one more component to our web hosting offerings: self-service support in the form of a wiki.

coldray.com’s wiki is based upon MediaWiki, the same wiki software that powers the venerable Wikipedia. Although MediaWiki doesn’t provide all the features I would like, I chose it because of its popularity and active development. Those two factors distinguish it from the other open-source wiki programs — I can rest assured that security holes will be discovered and patched in a timely matter, and more features implemented as the number of users and their feature requests increases.

So far, I’ve populated the wiki with some basic information regarding the Control Panel, how to create and delete email accounts, FTP accounts, subdomains, and MySQL databases.

Registration is currently required to edit, and only authorized users are allowed to register. I’m planning to relax these restrictions once I understand MediaWiki security better.

So What Else Do I Do?

Sunday, September 19th, 2004

During the weekends, when I have some free time, I’ve been (besides studying and sleeping):

Developing my ISP - pluto.betanegative.com went active last week (since renamed to mars.coldray.com). We’re currently hosted at ThePlanet with a modest amount of disk space and plenty of transfer. Since I’ve been busy with classes, I’ve only been able to configure Apache and some basic internal services. If time permits, I’ll be able to setup Postfix and get some spiffy webmail going this weekend. Hopefully.

Making Beer - That’s right, I’m actually using my Berkeley chemistry education. If you saw two guys carrying a large plastic bucket and a glass carboy around Berkeley this weekend, that was my friend Kevin and me. Although we can legally produce beer, we can’t legally drink it. Go figure.

Spreading OSS on the Berkeley DC Hub - If you’re a Berkeley student living in the dorms, you definitely want to check this out. It’s an incredibly fast private Berkeley P2P network hosted by my friend Mark that doesn’t count towards the bandwidth cap.

zeroion.evilcoder.com is up!

Saturday, February 28th, 2004

I just installed Drupal on zeroion.evilcoder.com. I’m quite impressed by the number of plugins available for Drupal; a lot of functionality can be added to it. It’s also a lot better than php-nuke, which suffers from over-design and just plain ugliness.

In other news, I continue my search for better blog software (does any exist?). So far, the ones I’ve looked at seem to be rather beta-quality at best; none seem to possess the maturity of MovableType. I’m quite disappointed, but I’m not giving up…

I’ll be posting up more content to zeroion.evilcoder.com when I get more time.

Here Comes the Weekend…

Friday, February 27th, 2004

Just got out of my Chem 4B lab. The lab this week was pretty straightforward, so we finished early. And now, I can finally rest.

I’m going to meet Bryan of ABSK for dinner at 7 around Durant Square, and afterwards, read my Chemistry textbook (quantum chemistry is evil). Sometime afterwards, I’ll setup some test-blogs at test.evilcoder.com to see which one I’ll migrate to (if I choose one, that is). Oh yeah, I’ll do zeroion.evilcoder.com at the same time.

Linux 2.6.4-rc1 was just released. e100 has been rewritten, among other things, so I’m looking forward to improved network performance. I might just load the new kernel sometime tonight…