Wide Awake Developers

« May 2009 | Main | September 2009 »

Hadoop versus VPN

I've been doing some work with Hadoop lately, and I just ran into an interesting problem with networking. This isn't a bug, per se, but a conflict in my configuration.

I'm running on a laptop, using a pseudo-distributed cluster. That means all the different processes are running, but they're all running on one box. That makes it possible to test jobs with full network communication, but without deploying to a production cluster.

I'm also working remotely, connecting to the corporate network by VPN. As is commonly done, our VPN is configured to completely separate the client machine from its local network. (If it didn't, you could use the VPN machine to bridge the secure corporate network to your home ISP, coffeeshop, airport, etc.)

Here's the problem: when on the VPN, my machine can't talk to its own IP address. Right now, ifconfig reports the laptops IP address as 192.168.1.105. That's the address associated with the physical NIC on the machine.

The odd part is that Hadoop mostly works this way. I've configured the name node, job tracker, task tracker, datanodes, etc. to all use "localhost". I can use HDFS, I can submit jobs, and all the map tasks work fine. The only problem is that when the map tasks finish, the task tracker cannot send data from the map tasks to the reduce tasks. The job appears to hang.

In the task tracker's log file, I see reports every 20 seconds or so that say

2009-07-31 11:01:33,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200907310946_003_r_000000_0 0.0% reduce > copy >

The instant I disconnected from the VPN, the copy proceeded and the reduce job ran.

I'm sure there's a configuration property somewhere within Hadoop that I can change. When (if) I find it, I'll update this post.

An AspectJ Circuit Breaker

Spiros Tzavellas pointed me to his implementation of Circuit Breaker. His approach uses AspectJ and can be applied using a bytecode weaver or AspectJ compiler. He's also got unit tests with 85% coverage.

Spiros' project page is here, and the code is (where else?) on GitHub. He appears to be quite actively developing the project.

Two New Circuit Breaker Implementations

The excellent Will Sargent has created a Circuit Breaker gem that's quite nice. You can read the docs at rdoc.info. He's released the code (under LGPL) on GitHub.

The other one has actually been out for a couple of months now, but I forgot to blog about it. Scott Vlamnick created a Grails plugin that uses AOP to weave Circuit Breaker functionality as "around" advice. This one can also report its state via JMX. In a particularly nice feature, this plugin supports different configurations in different environments.