Wide Awake Developers

GMail Outage Was a Chain Reaction

| Comments

Google has published an explanation of the widespread GMail outage from September 1st. In this explanation, they trace the root cause to a layer of “request routers”:

…a few of the request routers became overloaded and in effect told the rest of the system “stop sending us traffic, we’re too slow!”. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.

This perfectly describes the “Chain Reaction” stability antipattern from Release It!

Hadoop Versus VPN

| Comments

I’ve been doing some work with Hadoop lately, and I just ran into an interesting problem with networking. This isn’t a bug, per se, but a conflict in my configuration.

I’m running on a laptop, using a pseudo-distributed cluster. That means all the different processes are running, but they’re all running on one box. That makes it possible to test jobs with full network communication, but without deploying to a production cluster.

I’m also working remotely, connecting to the corporate network by VPN. As is commonly done, our VPN is configured to completely separate the client machine from its local network. (If it didn’t, you could use the VPN machine to bridge the secure corporate network to your home ISP, coffeeshop, airport, etc.)

Here’s the problem: when on the VPN, my machine can’t talk to its own IP address. Right now, ifconfig reports the laptops IP address as 192.168.1.105. That’s the address associated with the physical NIC on the machine.

The odd part is that Hadoop mostly works this way. I’ve configured the name node, job tracker, task tracker, datanodes, etc. to all use “localhost”. I can use HDFS, I can submit jobs, and all the map tasks work fine. The only problem is that when the map tasks finish, the task tracker cannot send data from the map tasks to the reduce tasks. The job appears to hang.

In the task tracker’s log file, I see reports every 20 seconds or so that say

2009-07-31 11:01:33,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200907310946_003_r_000000_0 0.0% reduce > copy >

The instant I disconnected from the VPN, the copy proceeded and the reduce job ran.

I’m sure there’s a configuration property somewhere within Hadoop that I can change. When (if) I find it, I’ll update this post.

An AspectJ Circuit Breaker

| Comments

Spiros Tzavellas pointed me to his implementation of Circuit Breaker. His approach uses AspectJ and can be applied using a bytecode weaver or AspectJ compiler. He’s also got unit tests with 85% coverage.

Spiros’ project page is here, and the code is (where else?) on GitHub. He appears to be quite actively developing the project.

Two New Circuit Breaker Implementations

| Comments

The excellent Will Sargent has created a Circuit Breaker gem that’s quite nice. You can read the docs at rdoc.info. He’s released the code (under LGPL) on GitHub.

The other one has actually been out for a couple of months now, but I forgot to blog about it. Scott Vlamnick created a Grails plugin that uses AOP to weave Circuit Breaker functionality as “around” advice. This one can also report its state via JMX. In a particularly nice feature, this plugin supports different configurations in different environments.

Workmen, Tools, Etc.

| Comments

We’ve all heard the old saw, “It’s a poor workman that blames his tools.” Let’s think about that for a minute. Does it actual mean that a skilled craftsman can do great work with shoddy implements?

Well, can a chef make a souffle with a skillet?

Can a cabinetmaker round an edge with dull router bits?

I’m not going to rule it out. Perhaps there’s a brilliant chef who—at this very moment—is preparing to introduce the world to the “skiffle.” And, it’s possible that one could coax a dull router into making a better quarter round through care, attention, and good speed control.

Going by the odds, though, I’d bet on scrambled eggs and splinters.

Like a lot of old sayings, this one doesn’t make much sense in it’s usual interpretation. Most people take this proverb to mean that you should be able to turn out top-notch work with whatever tools you’re given. It’s an excuse for bad tools, or lack of interest in improving them.

This homily dates back to a time when workers would bring their own tools to the job, leading to the popular origin story for the phrase “getting sacked”. (No comments about møøse bites, please.) Some crafts have evaded the assembly line, and in those, craftsman still bring their own tools. Chefs bring their prized knives. Fine carpenters bring their own hand and bench tools.

There is a grain of truth in the common interpretation that good tools don’t make a good workman. There’s another level of truth under the surface, though. The 13th Century French version of this saying translates as, “A bad workman will never find a good tool.” I like this version a lot better. Tools cannot make one good, but bad tools can hurt a good worker’s performance. That sounds a lot less like “quit whining and use whatever’s at hand,” doesn’t it?

On the other hand, if you supply your own tools, you’re not as likely to tolerate bad ones, are you? I think this is the most important interpretation. Good workers—if given the choice—will select the best tools and keep them sharp.

Minireview: Beginning Scala

| Comments

As you can probably tell from my recent posts, I’ve been learning Scala. I recently dug into another Scala book, Beginning Scala by David Pollak.

Beginning Scala is a nice, gentle introduction to this language. It takes a gradual, example driven approach that emphasizes running code early. This makes it a good intro for people who want to use the language for applications first, then worry about creating frameworks later.

Don’t let that fool you, though. Pollak gets to the sophisticated parts soon enough. I particularly like a example of creating a new “control structure” to execute stuff in the context of a JDBC connection. This puts some meat on the argument that Scala is a “scalable language.” Where other languages either implement this as a keyword (as in Groovy’s “with”) or a framework (Spring’s “templates”), here it can be added with one page of example code.

Beginning Scala also has a very thorough discussion of actors. I appreciate this, because actors were my main motivation for learning Scala in the first place.

Pollak separates the act of consuming a library from that of creating a library. He advises us to worry most about types, traits, co- and contravariance, etc. mainly when we are creating libraries. True to this notion, chapter 7 is called “Traits and Types and Gnarly Stuff for Architects”. It doesn’t sound like much fun, but it is important material. I find that Scala makes me think more about the type system than other languages. It’s strongly, and statically, typed. (So much so, in fact, that it makes me realize just how loose Java’s own type system is.) As such, it pays to have a firm understanding of how code turns into types. Scala has a rich set of tools for building an expressive type system, but there is also complexity there. Checking in at 60 pages, this chapter covers Scala’s tools along with guidance on good styles and idioms.

Interestingly, although there is a Lift logo on the cover, there’s nothing about Lift in the book itself. Considering that Pollak is the creator of Lift, it’s curious that this book doesn’t deal with it. Perhaps that’s being left for another title.

Overall, I endorse Beginning Scala.

Units of Measure in Scala

| Comments

Failure to understand or represent units has caused several major disasters, including the costly Ariane 5 disaster in 1996. This is one of those things that DSLs often get right, but mainstream programming languages just ignore. Or, worse, they implement a clunky unit of measure library that ensures you can never again write a sensible arithmetic expression.

While I was at JAOO Australia this week, Amanda Laucher showed some F# code for a recipe that caught my attention. It used numeric literals with that directly attached units to quantities. What’s more, it was intelligent about combining units.

I went looking for something similar in Scala. I googled my fingertips off, but without much luck, until Miles Sabin pointed out that there’s already a compiler plugin sitting right next to the core Scala code itself.

Installing Units

Scala has it’s own package manager, called sbaz. It can directly install the units extension:

sbaz install units

This will install it under your default managed installation. If you haven’t done anything else, that will be your Scala install directory. If you have done something else, you probably already know what you’re doing, so I won’t try to give you instructions.

Using Units

To use units, you first have to import the library’s “Preamble”. It’s also helpful to go ahead and import the “StandardUnits” object. That brings in a whole set of useful SI units.

I’m going to do all this from the Scala interactive interpreter.

scala> import units.Preamble._
import units.Preamble._

scala> import units.StandardUnits._
import units.StandardUnits._

After that, you can multiply any number by a unit to create a dimensional quantity:

scala> 20*m
res0: units.Measure = 20.0*m

scala> res0*res0
res1: units.Measure = 400.0*m*m

scala> Math.Pi*res0*res0
res2: units.Measure = 1256.6370614359173*m*m

Notice that when I multiplied a length (in meters) times itself, I got an area (square meters). To me, this is a really exciting thing about the units library. It can combine dimensions sensibly when you do math on them. In fact, it can help prevent you from incorrectly combining units.

scala> val length = 5*mm
length: units.Measure = 5.0*mm

scala> val weight = 12*g
weight: units.Measure = 12.0*g

scala> length + weight
units.IncompatibleUnits: Incompatible units: g and mm

I can’t add grams and millimeters, but I can multiply them.

Creating Units

The StandardUnits package includes a lot of common units relating to basic physics. It doesn’t have any relating to system capacity metrics, so I’d like to create some units for that.

scala> import units._
import units._

scala> val requests = SimpleDimension("requests")
requests: units.SimpleDimension = requests

scala> val req = SimpleUnit("req", requests, 1.0)
req: units.SimpleUnit = req

scala> val Kreq = SimpleUnit("Kreq", requests, 1000.0)
Kreq: units.SimpleUnit = Kreq

Now I can combine that simple dimension with others. If I want to express requests per second, I can just write it directly.

scala> 565*req/s
res4: units.Measure = 565.0*req/s

Conclusion

This extension will be the first thing I add to new projects from now on. The convenience of literals, with the extensibility of adding my own dimensions and units means I can easily keep units with all of my numbers.

There’s no longer any excuse to neglect your units in a mainstream programming language.

Kudos to Relevance and Clojure

| Comments

It’s been a while since I blogged anything, mainly because most of my work lately has either been mind-numbing corporate stuff, or so highly contextualized that it wouldn’t be productive to write about.

Something came up last week, though, that just blew me away.

For various reasons, I’ve engaged Relevance to do a project for me. (Actually, the first results were so good that I’ve now got at least three more projects lined up.) They decided—and by “they”, I mean Stuart Halloway—to write the engine at the heart of this application in Clojure. That makes it sound like I was reluctant to go along, but actually, I was interested to see if the result would be as expressive and compact as everyone says.

Let me make a brief aside here and comment that I’m finding it much harder to be the customer on an agile project than to be a developer. I think there are two main reasons. First, it’s hard for me to keep these guys supplied with enough cards to fill an iteration. They’re outrunning me all the time. Big organizations like my employer just take a long time to decide anything. Second, there’s nobody else I can defer to when the team needs a decision. It often takes two weeks just for me to get a meeting scheduled with all of the stakeholders inside my company. That’s an entire iteration gone, just waiting to get to the meeting to make a decision! So, I’m often in the position of making decisions that I’m not 100% sure will be agreeable to all parties. So far, they have mostly worked out, but it’s a definite source of anxiety.

Anyway, back to the main point I wanted to make.

My personal theme is making software production-ready. That means handling all the messy things that happen in the real world. In a lab, for example, only one batch file ever needs to be processed at once. You never have multiple files waiting for processing, and files are always fully present before you start working on them. In production, that only happens if you guarantee it.

Another example, from my system. We have a set of rules (which are themselves written in Clojure code) that can be changed by privileged users. After changing the configuration, you can tell the daemonized Clojure engine to “(reload-rules!)”. The “!” at the end of that function means it’s an imperative with major side effects, so the rules get reloaded right now.

I thought I was going to catch them up when I asked, oh so innocently, “So what happens when you say (reload-rules!) while there’s a file being processed on the other thread?” I just love catching people when they haven’t dealt with all that nasty production stuff.

After a brief sidebar, Stu and Glenn Vanderburg decided that, in fact, nothing bad would happen at all, despite reloading rules in one thread while another thread was in the middle of using the rules.

Clojure uses a flavor of transactional memory, along with persistent data structures. No, that doesn’t mean they go in a database. It means that changes to a data structure can only be made inside of a transaction. The new version of the data structure and the old version exist simultaneously, for as long as there are outstanding references to them. So, in my case, that meant that the daemon thread would “see” the old version of the rules, because it had dereferenced the collection prior to the “reload-rules!” Meanwhile, the reload-rules! function would modify the collection in its own transaction. The next time the daemon thread comes back around and uses the reference to the rules, it’ll just see the new version of the rules.

In other words, two threads can both use the same reference, with complete consistency, because they each see a point-in-time snapshot of the collection’s state. The team didn’t have to do anything special to make this happen… it’s just the way that Clojure’s references, persistent data structures, and transactional memory work.

Even though I didn’t get to catch Stu and Glenn out on a production readiness issue, I still had to admit that was pretty frickin’ cool.

JAOO Australia in 1 Month

| Comments

Speaking at JAOO The Australian JAOO conferences are now just one month away. I’ve wanted to get to Australia for at least ten years now, so I am thrilled to finally get there.

I’ll be delivering a tutorial on production ready software in both the Brisbane and Sydney conferences. This tutorial was a hit at QCon London, where I first delivered it. The Australian version will be further improved.

During the main conference, I’ll be delivering a two-part talk on common failure modes) of distributed systems break and how to recover) from such breakage. These talks apply whether you’re building web facing systems or internal shared services/SOA projects.

Quantum Backups

| Comments

Backups are the only macroscopic system we commonly deal with that exhibits quantum mechanical effects. This is odd enough that I’ve spent some time getting tangled up in these observations.

Until you attempt a restore, a backup set is neither good nor bad, but a superposition of both. This is the superposition principle.

The peculiarity of the superposition principle is dramatically illustrated with the experiment of Schrödinger’s backup. This is when you attempt to restore Schrödinger’s pictures of his cat, and discover that the cat is not there.

In a startling corollary, if you use offsite vaulting, a second quantum variable is introduced, in that the backup set exists and does not exist simultaneously. A curious effect emerges upon applying the Hamiltonian operator. The operator shows that certain eigenvalues are always zero, revealing that prime numbered tapes greater than 5 in a set never exist.

Finally, the Heisenbackup principle says that the user of a system is entangled with the system itself. As a result, within 30 days of consciously deciding that you do not need to run a backup, you will experience a complete disk crash. Because you’ve just read this, your 30 days start now.

Sorry about that.