Wide Awake Developers

Life’s Little Frustrations

| Comments

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. -Leslie Lamport

On my way to QCon Tokyo and QCon China, I had some time to kill so I headed over to Delta’s Skyclub lounge. I’ve been a member for a few years now. And why not? I mean, who could pass up tepid coffee, stale party snacks, and a TV permanently locked to CNN? Wait… that actually doesn’t sound like such a hot deal.

Oh! I remember, it’s for the wifi access. (Well, that plus reliably clean bathrooms, but we need not discuss that.) Being able to count on wifi access without paying for yet another data plan has been pretty helpful for me. (As an aside, I might change my tune once I try a mifi box. Carrying my own hotspot sounds even better.)

Like most wifi providers, the Skyclub has a captive portal. Before you can get a TCP/IP connection to anything, you have to submit a form with a checkbox to agree to 89 pages of terms and conditions. I’m well aware that Delta’s lawyers are trying to make sure the company isn’t liable if I go downloading bootlegs of every Ally McBeal episode. But I really don’t know if these agreements are enforceable. For all I know, page 83 has me agreeing to 7 years indentured servitude cleaning Delta’s toilets.

Anyway, Delta has outsourced operations of their wifi network to Concourse Communications. And apparently, they’ve had an outage all morning that has blocked anyone from using wifi in the Minneapolis Skyclubs. When I submit the form with the checkbox, I get the following error page:

Including this bit of stacktrace:

There’s a lot to dislike here.

  1. Why is this yelling at me, the user? To anyone who isn’t a web site developer, this makes it sound like the user did something wrong. There’s a ton of scary language here: "instance-specific error", "allow remote connections", "Named Pipes Provider"… heck, this sounds like it’s accusing the user of hacking servers. "Stack trace" sure sounds like the Feds are hot on somebody’s trail, doesn’t it?
  2. Isn’t it fabulous to know that Ken keeps his projects on his D: drive? If I had to lay bets, I’d say that Ken screwed up his configuration string. In fact, the whole problem smells like a failed deployment or poorly executed change. Ken probably pushed some code out late on a Friday afternoon, then boogied out of town. My prediction (totally unverifiable, of course) is that this problem will take less than 5 minutes to resolve, once Ken gets his ass back from the beach.
  3. We mere users get to see quite a bit of internal information here. Nothing really damaging, unless of course Wilson ORMapper has some security defects or something like that.
  4. Stepping back from this specific error message, we have the larger question: is it sensible to couple availability of the network to the availability of this check-the-box application? Accessing the network is the primary purpose of this whole system. It is the most critical feature. Is collecting a compulsory boolean "true" from every user really as important as the reason the whole damn thing was built in the first place? Of course not! (As an aside, this is an example of Le Chatelier’s Principle: "Complex systems tend to oppose their own proper function.")

We see this kind of operational coupling all the time. Non-critical features are allowed to damage or destroy critical features. Maybe there’s a single thread pool that services all kinds of requests, rather than reserving a separate pool for the important things. Maybe a process is overly linearized and doesn’t allow for secondary, after-the-fact processing. Or, maybe a critical and a non-critical system both share an enterprise service—producing a common-mode dependency.

Whatever the proximate cause, the underlying problem is lack of diligence in operational decoupling.

Topics in Architecture

| Comments

I’m working on a syllabus for an extensive course on web architecture. This will be for experienced programmers looking to become architects.

Like all of my work about architecture, this covers technology, business, and strategic aspects, so there’s an emphasis on creating high-velocity, competitive organizations.

In general, I’m aiming for a mark that’s just behind the bleeding edge. So, I’m including several of the NoSQL persistence technologies, for example, but not including Erjang because it’s too early. (Or is that “erl-y”? )

(What I’d really love to do is make a screencast series out of all of these. I’m daunted, though. There’s a lot of ground to cover here!)

EDIT: Added function and OO styles of programming. (Thanks @deanwampler.) Added JRuby/Java under languages. (Thanks @glv.)

I’m interested in hearing your feedback. What would you add? Remove?

  • Methods and Processes

    • Systems Thinking/Learning Organization
    • High Velocity Organizations
    • Safety Culture
    • Error-Inducing Systems (“Normal Accidents”)
    • Points of Leverage
    • Fundamental Dynamics: Iteration, Variation, Selection, Feedback, Constraint
    • 5D architecture
    • Failures of Intuition
    • ToC
    • Critical Chain
    • Lean Software Development
    • Real Options
    • Strategic Navigation
    • OODA
    • Tempo, Adaptation
    • XP
    • Scrum
    • Lean
    • Kanban
    • TDD
  • Architecture Styles

    • REST / ROA
    • SOA
    • Pipes & Filters
    • Actors
    • App-server centric
    • Event-Driven Architecture
  • Web Foundations

    • The “architecture” of the web
    • HTTP 1.0 & 1.1
    • Browser fetch behaviors
    • HTTP Intermediaries
  • The Nature of the Web

    • Crowdsourcing
    • Folksonomy
    • Mashups/APIs/Linked Open Data
  • Testing

    • TDD
    • Unit testing
    • BDD/Spec testing
    • ScalaCheck
    • Selenium
  • Persistence

    • Redis
    • CouchDB
    • Neo4J
    • eXist
    • “Web-shaped” persistence
  • Technical architecture

    • 8 Fallacies of Distributed Computing
    • CAP Theorem
    • Scalability
    • Reliability
    • Performance
    • Latency
    • Capacity
    • Decoupling
    • Safety
  • Languages and Frameworks

    • Spring
    • Groovy/Grails
    • Scala
      • Lift
    • Clojure
      • Compojure
    • JRuby
      • Rails
    • OSGi
  • Design

    • Code Smells
    • Object Thinking
    • Object Design
    • Functional Thinking
    • API Design
    • Design for Operations
    • Information Hiding
    • Recognizing Coupling
  • Deployment

    • Physical
    • Virtual
    • Multisite
    • Cloud (AWS)
    • Chef
    • Puppet
    • Capistrano
  • Build and Version Control

    • Git
    • Ant
    • Maven
    • Leiningen
    • Private repos
    • Collaboration across projects

“If the Last One Goes, We’ll Be Up Here All Night!”

| Comments

There’s an old joke about a couple of folks on a plane who hear the captain successively announce that they’ve lost one, two, then three engines. Each time, he reassures the passengers that they’re OK, but will be progressively later to land. After the losing the third engine, one passenger tells the other, “If the last one goes, we’ll be up here all night!”

It’s a remarkable aircraft that can fly on just one out of four engines. Most four engine jets need at least two to cruise. (I’ve been told that they can make a controlled descent on one engine, but can’t maintain altitude.)

Likewise, your web app probably needs more than just one functioning server to handle demand. The usual approach to computing availability is to compute the odds that at least one server survives:

rel_3-1a.png

If all the servers are identical, meaning that we expect them to have the same failure rate, then this reduces to the more familiar form:

rel_3-1b.png

Coupling and Coevolution

| Comments

The mighty Mississippi River starts in Minnesota, at Lake Itasca. Every kid in Minnesota has to make the ritual pilgrimage to Itasca State Park at some point, where wading across North America’s longest river is a rite of passage.

Mississippi River Starts Here

One of the very interesting things in Itasca State Park is a section of forest that is fenced off so that deer cannot enter it. It’s part of a decades-long experiment to see how forests are affected by browsing herbivores. What’s really interesting is that not only are the quantity of plants different inside the protected area, but the types of plants and trees are different, too. Because deer prefer to nibble on younger trees, fewer saplings survive in the main body of the forest than in the fenced-off portion. Outside the fence, the distribution of tree size and age is biased toward older trees. The population of trees is weighted more toward resinous species like pines, which deer prefer not to eat. Inside the fence, more saplings survive into young maturity, so you see a more even distribution of tree ages and a wider diversity of species represented in the mature trees. The changes in the canopy affect the ground cover which, in turn, change how deer could (if allowed) reach the trees and browse them.

So, here’s a feedback loop that involves deer, trees, leaves and brush. The net result is a different ecosystem (albeit a slightly artificial one.)

Most physical and biological systems are like this in several ways, particularly relating to feedback. In our artificial systems (electrical, mechanical, symbolic, or semantic) we build in feedback mechanisms as a deliberate control. These are often one dimensional, proportional, and negative.

In natural systems, feedback arises everywhere. Sometimes, it proves to be helpful for the long-term stability of the system. In which case, the feedback itself gets reinforced by the existence and perpetuation of the system it exists within. In a sense, the system adapts to reinforce beneficial feedback. Conversely, feedback webs that cause too much instability will, like an overly aggressive virus, lead to destruction of their host system and disappear. So, we can see the constituents of a system co-evolving with each other and the system itself.

The old “microphone-amplifier-speaker-squealing” example of feedback really fails here. We lack both language and metaphor to really grasp this kind of interaction over time. In part, I think that’s because we like to separate the world into isolated components and only talk about components at a single level of abstraction. The trouble is that abstractions like “level of abstraction” only exist in our minds.

Here’s another example of coevolution, courtesy of Jared Diamond in “Guns, Germs, and Steel”. I’ll apologize in advance for oversimplifying; I’m devoting a paragraph to an argument he develops across entire chapters.

At some point, a group of nomads decided that the seeds of these particular grasses were tasty. In collecting the grasses, they spread it around. Some kinds of seeds survived the winter better and responded well to being sown by humans. Now, nobody sat down and systematically picked out which seeds grew better or worse. They didn’t have to, because the seeds that grew better produced more seeds for the next generation. Over time, a tiny difference (fractions of a percent) in productivity would lead some strains to supplant the others. Meanwhile, inextricably linked, some humans figured out how to plants, harvest, and eat these early grains. These humans had an advantage over their neighbors, so they were able to feed more babies. That turns out to be a benefit, because farming is hard work and requires more offspring to help produce food. (Another feedback loop.) Oh, and this kind of labor makes it advantageous to keep livestock, too. Over time, these farmers would breed and feed more children than the nomads, so farmers would come to be a larger and larger percentage of the population. Just as an added wrinkle, keeping livestock and fertilizing fields both lead to diseases that simultaneously harm the individuals and occasionally decimate the population, but also provide some long-term benefits such as better disease resistance and inadvertent biological warfare when encountering other civilizations.

Try to diagram the feedback loops here: nomads, farmers, livestock, grains, birthrates, and so on. Everything is connected to everything else. It’s really hard to avoid slipping into teleological language here. We’ve got feedback and feedforward at several different levels and timescales here, from the scale of microbes to livestock to civilizations, and across centuries. This dynamic altered the course of many species evolution: cattle, wheat, maize, and yes, good old H. Sapiens.

This complexity of interaction extends to planetary and stellar levels as well. At some sufficiently long time scale, the intergalactic medium is coupled to our planetary ecosystem.

The human intellectual penchant for decomposition, isolation, and leveled abstraction is purely an artifact of the size of our bodies and the duration of our lives.

GMail Outage Was a Chain Reaction

| Comments

Google has published an explanation of the widespread GMail outage from September 1st. In this explanation, they trace the root cause to a layer of “request routers”:

…a few of the request routers became overloaded and in effect told the rest of the system “stop sending us traffic, we’re too slow!”. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.

This perfectly describes the “Chain Reaction” stability antipattern from Release It!

Hadoop Versus VPN

| Comments

I’ve been doing some work with Hadoop lately, and I just ran into an interesting problem with networking. This isn’t a bug, per se, but a conflict in my configuration.

I’m running on a laptop, using a pseudo-distributed cluster. That means all the different processes are running, but they’re all running on one box. That makes it possible to test jobs with full network communication, but without deploying to a production cluster.

I’m also working remotely, connecting to the corporate network by VPN. As is commonly done, our VPN is configured to completely separate the client machine from its local network. (If it didn’t, you could use the VPN machine to bridge the secure corporate network to your home ISP, coffeeshop, airport, etc.)

Here’s the problem: when on the VPN, my machine can’t talk to its own IP address. Right now, ifconfig reports the laptops IP address as 192.168.1.105. That’s the address associated with the physical NIC on the machine.

The odd part is that Hadoop mostly works this way. I’ve configured the name node, job tracker, task tracker, datanodes, etc. to all use “localhost”. I can use HDFS, I can submit jobs, and all the map tasks work fine. The only problem is that when the map tasks finish, the task tracker cannot send data from the map tasks to the reduce tasks. The job appears to hang.

In the task tracker’s log file, I see reports every 20 seconds or so that say

2009-07-31 11:01:33,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200907310946_003_r_000000_0 0.0% reduce > copy >

The instant I disconnected from the VPN, the copy proceeded and the reduce job ran.

I’m sure there’s a configuration property somewhere within Hadoop that I can change. When (if) I find it, I’ll update this post.

An AspectJ Circuit Breaker

| Comments

Spiros Tzavellas pointed me to his implementation of Circuit Breaker. His approach uses AspectJ and can be applied using a bytecode weaver or AspectJ compiler. He’s also got unit tests with 85% coverage.

Spiros’ project page is here, and the code is (where else?) on GitHub. He appears to be quite actively developing the project.

Two New Circuit Breaker Implementations

| Comments

The excellent Will Sargent has created a Circuit Breaker gem that’s quite nice. You can read the docs at rdoc.info. He’s released the code (under LGPL) on GitHub.

The other one has actually been out for a couple of months now, but I forgot to blog about it. Scott Vlamnick created a Grails plugin that uses AOP to weave Circuit Breaker functionality as “around” advice. This one can also report its state via JMX. In a particularly nice feature, this plugin supports different configurations in different environments.

Workmen, Tools, Etc.

| Comments

We’ve all heard the old saw, “It’s a poor workman that blames his tools.” Let’s think about that for a minute. Does it actual mean that a skilled craftsman can do great work with shoddy implements?

Well, can a chef make a souffle with a skillet?

Can a cabinetmaker round an edge with dull router bits?

I’m not going to rule it out. Perhaps there’s a brilliant chef who—at this very moment—is preparing to introduce the world to the “skiffle.” And, it’s possible that one could coax a dull router into making a better quarter round through care, attention, and good speed control.

Going by the odds, though, I’d bet on scrambled eggs and splinters.

Like a lot of old sayings, this one doesn’t make much sense in it’s usual interpretation. Most people take this proverb to mean that you should be able to turn out top-notch work with whatever tools you’re given. It’s an excuse for bad tools, or lack of interest in improving them.

This homily dates back to a time when workers would bring their own tools to the job, leading to the popular origin story for the phrase “getting sacked”. (No comments about møøse bites, please.) Some crafts have evaded the assembly line, and in those, craftsman still bring their own tools. Chefs bring their prized knives. Fine carpenters bring their own hand and bench tools.

There is a grain of truth in the common interpretation that good tools don’t make a good workman. There’s another level of truth under the surface, though. The 13th Century French version of this saying translates as, “A bad workman will never find a good tool.” I like this version a lot better. Tools cannot make one good, but bad tools can hurt a good worker’s performance. That sounds a lot less like “quit whining and use whatever’s at hand,” doesn’t it?

On the other hand, if you supply your own tools, you’re not as likely to tolerate bad ones, are you? I think this is the most important interpretation. Good workers—if given the choice—will select the best tools and keep them sharp.

Minireview: Beginning Scala

| Comments

As you can probably tell from my recent posts, I’ve been learning Scala. I recently dug into another Scala book, Beginning Scala by David Pollak.

Beginning Scala is a nice, gentle introduction to this language. It takes a gradual, example driven approach that emphasizes running code early. This makes it a good intro for people who want to use the language for applications first, then worry about creating frameworks later.

Don’t let that fool you, though. Pollak gets to the sophisticated parts soon enough. I particularly like a example of creating a new “control structure” to execute stuff in the context of a JDBC connection. This puts some meat on the argument that Scala is a “scalable language.” Where other languages either implement this as a keyword (as in Groovy’s “with”) or a framework (Spring’s “templates”), here it can be added with one page of example code.

Beginning Scala also has a very thorough discussion of actors. I appreciate this, because actors were my main motivation for learning Scala in the first place.

Pollak separates the act of consuming a library from that of creating a library. He advises us to worry most about types, traits, co- and contravariance, etc. mainly when we are creating libraries. True to this notion, chapter 7 is called “Traits and Types and Gnarly Stuff for Architects”. It doesn’t sound like much fun, but it is important material. I find that Scala makes me think more about the type system than other languages. It’s strongly, and statically, typed. (So much so, in fact, that it makes me realize just how loose Java’s own type system is.) As such, it pays to have a firm understanding of how code turns into types. Scala has a rich set of tools for building an expressive type system, but there is also complexity there. Checking in at 60 pages, this chapter covers Scala’s tools along with guidance on good styles and idioms.

Interestingly, although there is a Lift logo on the cover, there’s nothing about Lift in the book itself. Considering that Pollak is the creator of Lift, it’s curious that this book doesn’t deal with it. Perhaps that’s being left for another title.

Overall, I endorse Beginning Scala.