Wide Awake Developers

Bad Layering

| Comments

If I had to guess, I would say that “Layers” is probably the most commonly applied architecture pattern. And why not? Parfaits have layers, and who doesn’t like a parfait? So layers must be good.

Like everything else, though, there’s a good way and a bad way.

The usual Neapolitan stack looks like this:

On one of my favorite projects of all, we used more layers because we wanted to further isolate different behaviors. In that project, we added a “UI Model” distinct from the “Domain.”

We impose this style because we want to separate concerns. This should provide us with two big benefits. First, we can change the contents of each layer independently. So changes to the GUI should not affect the domain, and changes to the domain should not affect persistence. The second benefit we want is the ability to substitute a layer. We may swap out a layer for the sake of testing (often in the case of persistence layers) or for different product configurations.

People sometimes make an argument for swapping out a layer in case of technology change. That argument is used for ORMs in the persistence layer, but I don’t find it convincing. Changing persistence on an existing application is by far not the most common kind of change. You’d be buying an expensive option that is seldom exercised.

When Good Layers Go Bad

The trouble arises when the layers are built such that we have to drill through several of them to do something common. Have you ever checked in a commit that had a bunch of new files like “Foo”, “FooController”, “FooForm”, “FooFragment”, “FooMapper”, “FooDTO”, and so on? That, dear reader, is a breakdown in layering.

It comes from each layer being decomposed along the same dimension. In this case, aligned by domain concept. That means the domain layer is dominating the other layers.

I would much rather see each layer have objects and functions that express the fundamental concepts of that layer. “Foo” is not a persistence concept, but “Table” and “Row” are. “Form” is a GUI concept, as is “Table” (but a different kind of table than the persistence one!) The boundary between each layer should be a matter of translating concepts.

In the UI, a domain object should be atomized into its constituent attributes and constraints. In persistence, it should be atomized into rows in one or more tables (in SQL-land) or one or more linked documents.

What appears as a class in one layer should be mere data to every other layer.

How Does It Happen?

This breakdown in layering can arise from more than one dynamic process.

  1. The application framework may impose this structure.
  2. The language may not have abstractions powerful enough to make it pleasant to work with data.
  3. TDD without enough refactoring. Each thin slice through the application adds one more strand of “Foo and Friends”. Truly merciless refactoring would pull out the common behavior sideways into the layer-specific concepts I described above. Lacking merciless refactoring, the project will accrete sticky strands like cotton candy on a toddler.
  4. The team may not have seen it done any other way.

What If It Happens To You?

Maybe you already have degenerate layers. Assuming they aren’t required by your framework, start looking for opportunities to refactor. Don’t just build a class hierarchy so you can inherit implementations. Rather, look for common patterns of interaction. Figure out how to turn the code you’ve got in classes into data acted on by classes relevant to the layer.

Use maps. Convert objects into maps from field identifier to an object that represents the salient aspect of the field for that layer:

  • For a GUI, those aspects will be something like “lexical type”, “editable”, “constraint” / “validation”, “semantic class”, and so on.
  • For persistence, they will deal with “length”, “representation format”, “referent,” etc.

Seek and destroy DTOs. They should be maps.

A DTO clearly indicates that your class is crossing a boundary. And yet, it requires that code on both sides of the boundary codes to the method signatures of the DTO. That means there is precisely zero translation at the boundary.

Where To Go From Here

Let me be clear, I like parfaits. (Yogurt and fruit! Ice cream, nuts, caramel!) I have nothing against layers. Most of my applications are built from layers. It’s just that getting the benefits we seek requires more effort than smearing a single domain concept across multiple subdirectories.

If “Layers” is the only architecture pattern you’ve used, then you’re in for a treat. There are plenty of other fundamental structures to explore. Pipes and filters. Blackboard. Components. Set GoF aside and go read Pattern-Oriented Software Architecture. The whole series is a treasure trove and an encyclopedia.

People Don’t Belong to Organizations

| Comments

One company that gets this right is Github. I exist as my own person there. I’m affiliated with my employer as well as other organizations.

We are long past the days of “the company man,” when a person’s identity was solely bound to their employer. That relationship is much more fluid now.

A company that gets it wrong is Atlassian. I’ve left behind a trail of accounts in various Jirae and Confluences. Right now, the biggest offender in their product lineup is HipChat. My account is identified by my email address, but it’s bound up with an organization. If I want to be part of my employer’s HipChat as well as a client’s, I have to resort to multiple accounts signed up with plus addresses. It’s great that GMail supports that, but I still can’t log in to more than one account at a time.

More generally, this is a failure in modeling. Somewhere along the line, somebody drew a line between `Organization` and `Person` on their model, with a one-to-many relationship. One `Organization` can have many `Person` entities, but each `Person` belongs to exactly one `Organization`.

I’ll go even further. The proper way to approach this today is to relate `Organization` and `Person` by way of another entity. Reify the association! Is it employment? Put the start and end dates on the employment. Oh, and don’t delete the association once it ends… that’s erasing it from history.

I think the default for pretty much any relationship these days should be many-to-many. Particularly any data relationship that models a real relationship in the external world. We shouldn’t let the bad old days of SQL join tables deter us from doing the right thing now.

Glue Fleet and Compojure Together Using Protocols

| Comments

Inspired by Glenn Vanderburg’s article on Clojure templating frameworks, I decided to try using Fleet for my latest pet project. Fleet has a very nice interface. I can call a single function to create new Clojure functions for every template in a directory. That really makes the templates feel like part of the language. Unfortunately, Glenn’s otherwise excellent article didn’t talk about how to connect Fleet into Compojure or Ring. I chose to interpret that as a compliment, springing from his high esteem of our abilities.

My first attempt, just calling the template function directly as a route handler resulted in the following:

java.lang.IllegalArgumentException: No implementation of method: :render of protocol: #'compojure.response/Renderable found for class: fleet.util.CljString

Ah, you’ve just got to love Clojure errors. After you understand the problem, you can always see that the error precisely described what was wrong. As an aid to helping you understand the problem… well, best not to dwell on that.

The clue is the protocol. Compojure knows how to turn many different things into valid response maps. It can handle nil, strings, maps, functions, references, files, seqs, and input streams. Not bad for 22 lines of code!

There’s probably a simpler way that I can’t see right now, but I decided to have CljString support the same protocol.

Take a close look at the call to extend-protocol on lines 12 through 15. I’m adding a protocol–which I didn’t create–onto a Java class–which I also didn’t create. My extension calls a function that was created at runtime, based on the template files in a directory. There’s deep magic happening beneath those 3 lines of code.

Because I extended Renderable to cover CljString, I can use any template function directly as a route function, as in line 17. (The function views/index was created by the call to fleet-ns on line 10.)

So, I glued together two libraries without changing the code to either one, and without resorting to Factories, Strategies, or XML-configured injection.

Metaphoric Problems in REST Systems

| Comments

I used to think that metaphor was just a literary technique, that it was something you could use to dress up some piece of creative writing. Reading George Lakoff’s Metaphors We Live By, though has changed my mind about that.

I now see that metaphor is not just something we use in writing; it’s actually a powerful technique for structuring thought. We use metaphor when we are creating designs. We say that a class is like a factory, that an object is a kind of a thing. The thing may be an animal, it may be a part of a whole, or it may be representative of some real world thing.

All those are uses of metaphor, but there is a deeper structure of metaphors that we use every day, without even realizing it. We don’t think of them as metaphors because in a sense these are actually the ways that we think. Lakoff uses the example of “The tree is in front of the mountain.” Perfectly ordinary sentence. We wouldn’t think twice about saying it.

But the mountain doesn’t actually have a front, neither does the tree. Or if the mountain has a front, how do we know it’s facing us? What we actually mean, if we unpack that metaphor is something like, “The distance from me to the tree is less than the distance from me to the mountain.” Or, “The tree is closer to me than the mountain is.” That we assign that to being in front is actually a metaphoric construct.

When we say, “I am filled with joy.” We are actually using a double metaphor, two different metaphors related structurally. One, is “A Person Is A Container,” the other is, “An Emotion Is A Physical Quantity.” Together it makes sense to say, if a person is a container and emotion is a physical thing then the person can be full of that emotion. In reality of course, the person is no such thing. The person is full of all the usual things a person is full of, tissues, blood, bones, other fluids that are best kept on the inside.

But we are embodied beings, we have an inside and an outside and so we think of ourselves as a container with something on the inside.

This notion of containers is actually really important.

Because we are embodied beings, we tend to view other things as containers as well. It would make perfect sense to you if I said, “I am in the room.” The room is a container, the building is a container. The building contains the room. The room contains me. No problem.

It would also make perfect sense to you, if I said, “That program is in my computer.” Or we might even say, “that video is on the Internet.” As though the Internet itself were a container rather than a vast collection of wires and specialized computers.

None of these things are containers, but it’s useful for us to think of them as such. Metaphorically, we can treat them as containers. This isn’t just an abstraction about the choice of pronouns. Rather the use of the pronouns I think reflects the way that we think about these things.

We also tend to think about our applications as containers. The contents that they hold are the features they provide. This has provided a powerful way of thinking about and structuring our programs for a long time. In reality, no such thing is happening. The program source text doesn’t contain features. It contains instructions to the computer. The features are actually sort of emergent properties of the source text.

Increasingly the features aren’t even fully specified within the source text. We went through a period for a while where we could pretend that everything was inside of an application. Take web systems for example. We would pretend that the source text specified the program completely. We even talked about application containers. There was always a little bit of fuzziness around the edges. Sure, most of the behavior was inside the container. But there were always those extra bits. There was the web server, which would have some variety of rules in it about access control, rewrite rules, ways to present friendly URLs. There were load balancers and firewalls. These active components meant that it was really necessary to understand more than the program text, in order to fully understand what the program was doing.

The more the network devices edged into Layer 7, previously the domain of the application, the more false the metaphor of program as container became. Look at something like a web application firewall. Or the miniature programs you can write inside of an F5 load balancer. These are functional behavior. They are part of the program. However, you will never find them in the source text. And most of the time, you don’t find them inside the source control systems either.

Consequently, systems today are enormously complex. It’s very hard to tell what a system is going to do once you put into production. Especially in those edge cases within hard to reach sections of the state space. We are just bad at thinking about emergent properties. It’s hard to design properties to emerge from simple rules.

I think we’ll find this most truly in RESTful architectures. In a fully mature REST architecture, the state of the system doesn’t really exist in either the client or the server, but rather in the communication between the two of them. We say, HATEOAS “Hypertext As The Engine Of Application State,” (which is a sort of shibboleth use to identify true RESTafarian’s from the rest of the world) but the truth is: what the client is allowed to do is to hold to it by the server at any point in time, and the next state transition is whatever the client chooses to invoke. Once we have that then the true behavior of the system can’t actually be known just by the service provider.

In a REST architecture we follow an open world assumption. When we’re designing the service provider, we don’t actually know who all the consumers are going to be or what their individual and particular work flows maybe. Therefore we have to design for a visible system, an open system that communicates what it can do, and what it has done at any point in time. Once we do that then the behavior is no longer just in the server. And in a sense it’s not really in the client either. It’s in the interaction between the two of them, in the collaborations.

That means the features of our system are emergent properties of the communication between these several parts. They’re externalized. They’re no longer in anything. There is no container. One could almost say there’s no application. The features exists somewhere in the white space between those boxes on the architecture diagram.

I think we lack some of the conceptual tools for that as well. We certainly don’t have a good metaphorical structure for thinking about behavior as a hive-like property emerging from the collaboration of these relatively, independent and self-directed pieces of software.

I don’t know where the next set of metaphors will come from. I do know that the attempt to force web-shaped systems in to the application is container metaphor, simply won’t work anymore. In truth, they never worked all that well. But now it’s broken down completely.

Time Motivates Architecture

| Comments

Let’s engage in a thought experiment for a moment. Suppose that software was trivial to create and only ever needed to be used once. Completely disposable. So, somebody comes to you and says, “I have a problem and I need you to solve it. I need a tool that will do blah-de-blah for a little while.” You could think of the software the way that a carpenter thinks of a jig for cutting a piece of wood on a table saw, or a metalworker thinks of creating a jig to drill a hole at the right angle and depth.

If software were like this, you would never care about its architecture. You would spend a few minutes to create the thing that was needed, it would be used for the job at hand, and then it would be thrown away. It really wouldn’t matter how good the software was on the inside–how easy it was to change–because you’d never change it! It wouldn’t matter how it adapted to changing business requirements, because you’d just create a new one when the new requirement came up. In this thought experiment we wouldn’t worry about architecture.

The key difference between this thought experiment and actual software? Of course, actual software is not disposable. It has a lifespan over some amount of time. Really, it’s the time dimension that makes architecture important.

Over time, we need for many different people to work effectively in the software. Over time, we need the throughput of features to stay constant, or hopefully not decrease too much. Maybe it even increases in particularly nice cases. Over time, the business needs change so we need to adapt the software.

It’s really time that makes us care about architecture.

Isn’t it interesting then, that we never include time as a dimension in our architecture descriptions?

The Future of Software Development

| Comments

I’ve been asked to sit on a panel regarding the future of software development. This is always risky and makes me nervous, for two reasons. First, prediction is a notoriously low success-rate activity. Second, the people you always see making predictions like this are usually well past their “use by” date. Nevertheless, here are a collection of barely-related thoughts I have on that subject.

  • Two obvious trends are cloud computing and mobile access. They are complementary. As the number of people and devices on the net increases, our ability to shape traffic on the demand side gets worse. Spikes in demand will happen faster and reach higher levels over time. Mobile devices exacerbate the demand side problems by greatly increasing both the number of people on the net and the fraction of their time they are able to access it.

  • Large traffic volumes both create and demand large data. Our tools for processing tera- and petabyte datasets will improve dramatically. Map/Reduce computing (a la Hadoop) has created attention and excitement in this space, but it is ultimately just one tool among many. We need better languages to help us think and express large data problems. In particular, we need a language that makes big data processing accessible to people with little background in statistics or algorithms.

  • Speaking of languages, many of the problems we face today cannot be solved inside a single language or application. The behavior of a web site today cannot be adequately explained or reasoned about just by examining the application code. Instead, a site picks up attributes of behavior from a multitude of sources: application code, web server configuration, edge caching servers, data grid servers, offline or asynchronous processing, machine learning elements, active network devices (such as application firewalls), and data stores. “Programming” as we would describe it today–coding application behavior in a request handler–defines a diminishing portion of the behavior. We lack tools or languages to express and reason about these distributed, extended, fragmented systems. Consequently, it is difficult to predict the functionality, performance, capacity, scalability, and availability of these systems.

  • Some of this will be mitigated naturally as application-specific functions disappear into tools and frameworks. Companies innovating at the leading edge of scalability today are doing things in application-specific behavior to compensate for deficiencies in tools and platforms. For example, caching servers could arguably disappear into storage engines and no-one would complain. In other words, don’t count the database vendors out yet. You’ll see key-value stores and in-memory data grid features popping up in relational databases any day now.

  • In general, it appears that Objects will diminish as a programming paradigm. Object-oriented programming will still exist… I’m not claiming “the death of objects” or something silly like that. However, OO will become just one more paradigm among several, rather than the dominant paradigm it has been for the last 15 years. “Object oriented” will no longer be synonymous with “good”.

  • Some people have talked about “polyglot programming”. I think this is a red herring. Polylgot is a reality, but it should not be a goal. That is, programmers should know many languages and paradigms, but deliberately mixing languages in a single application should be avoided. What I think we will find instead is mixing of paradigms, supported by a single primary language, with adjunct languages used only as needed for specialized functions. For example, an application written in Scala may mix OO, functional, and actor-based concepts, and it may have portions of behavior expressed in SQL and Javascript. Nevertheless, it will still primarily be a Scala application. The fact that Groovy, Scala, Clojure, and Java all run on Java Virtual Machine shouldn’t mislead us into thinking that they are interchangeable… or even interoperable!

  • Regarding Java. I fear that Java will have to be abandoned to the “Enterprise Development” world. It will be relegated to the hands of cut-rate business coders bashing out their gray business applications for $30 / hour. We’ve passed the tipping point on this one. We used to joke that Java would be the next COBOL, but that doesn’t seem as funny now that it’s true. Java will continue to exist. Millions of lines of it will be written each year. It won’t be the driver of innovation, though. As individual programmers, I’d recommend that you learn another language immediately and differentiate yourself from the hordes of low-skill, low-rent outsource coders that will service the mainstream Java consumer.

  • Where will innovation come from? Although some of the blush seems to be coming off Ruby, the reduction in hype has mainly allowed Ruby and Ruby on Rails developers to knuckle down and produce. That community continues to drive tremendous innovation. Many of the interesting developments here relate to process. Ruby developers have given us fantastic tools like Gems and Capistrano, that let small teams outperform and outproduce groups four times their size.

  • To my great surprise, data storage has become a hotbed of innovation in the last few years. Some of this is driven by the high-scalability fetishists, which is probably the wrong reason for 98% of companies and teams. However, innovations around column stores, graph databases, and key-value stores offer developers new tools to reduce the impedance mismatch between their data storage and their programming language. We spent twenty years trying to squeeze objects into relational databases. Aside from the object databases, which were an early casualty of Oracle’s ascension, we mostly focused on changing the application code through framework after framework and ORM after ORM. It’s refreshing to see storage models that are easier to use and easier to modify.

  • This will also cause another flurry of “reactive innovation” from the database vendors, just as we saw with “Universal Databases” in the mid-90s. The big players here–Microsoft and Oracle–won’t let some schemaless little upstarts erode their market share. More significantly, they aren’t about to let their flagship products–and the ones which give them beachheads inside every major corporation–get intermediated by some open-source frameworks banged up by the social network giants. Look for big moves by these vendors into high scalability, agile storage, and eventual consistency storage.

Failover: Messy Realities

| Comments

People who don’t live in operations can carry some funny misconceptions in their heads. Some of my personal faves:

  • Just add some servers!
  • I want a report of every configuration setting that’s different between production and QA!
  • We’re going to make sure this (outage) never happens again!

I’ve recently been reminded of this during some discussions about disaster recovery. This topic seems to breed misconceptions. Somewhere, I think most people carry around a mental model of failover that looks like this:

Normal operations transitions directly and cleanly to failed over

That is, failover is essentially automatic and magical.

Sadly, there are many intermediate states that aren’t found in this mental model. For example, there can be quite some time between failure and it’s detection. Depending on the detection and notification, there can be quite a delay before failover is initiated at all. (I once spoke with a retailer whose primary notification mechanism seemed to be the Marketing VP’s wife.)

Once you account for delays, you also have to account for faulty mechanisms. Failover itself often fails, usually due to configuration drift. Regular drills and failover exercises are the only way to ensure that failover works when you need it. When the failover mechanisms themselves fail, your system gets thrown into one of these terminal states that require manual recovery.

Just off the cuff, I think the full model looks a lot more like this:

Many more states exist in the real world, including failure of the failover mechanism itself.

It’s worth considering each of these states and asking yourself the following questions:

  • Is the state transition triggered automatically or manually?
  • Is the transition step executed by hand or through automation?
  • How long will the state transition take?
  • How can I tell whether it worked or not?
  • How can I recover if it didn’t work?

Life’s Little Frustrations

| Comments

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. -Leslie Lamport

On my way to QCon Tokyo and QCon China, I had some time to kill so I headed over to Delta’s Skyclub lounge. I’ve been a member for a few years now. And why not? I mean, who could pass up tepid coffee, stale party snacks, and a TV permanently locked to CNN? Wait… that actually doesn’t sound like such a hot deal.

Oh! I remember, it’s for the wifi access. (Well, that plus reliably clean bathrooms, but we need not discuss that.) Being able to count on wifi access without paying for yet another data plan has been pretty helpful for me. (As an aside, I might change my tune once I try a mifi box. Carrying my own hotspot sounds even better.)

Like most wifi providers, the Skyclub has a captive portal. Before you can get a TCP/IP connection to anything, you have to submit a form with a checkbox to agree to 89 pages of terms and conditions. I’m well aware that Delta’s lawyers are trying to make sure the company isn’t liable if I go downloading bootlegs of every Ally McBeal episode. But I really don’t know if these agreements are enforceable. For all I know, page 83 has me agreeing to 7 years indentured servitude cleaning Delta’s toilets.

Anyway, Delta has outsourced operations of their wifi network to Concourse Communications. And apparently, they’ve had an outage all morning that has blocked anyone from using wifi in the Minneapolis Skyclubs. When I submit the form with the checkbox, I get the following error page:

Including this bit of stacktrace:

There’s a lot to dislike here.

  1. Why is this yelling at me, the user? To anyone who isn’t a web site developer, this makes it sound like the user did something wrong. There’s a ton of scary language here: "instance-specific error", "allow remote connections", "Named Pipes Provider"… heck, this sounds like it’s accusing the user of hacking servers. "Stack trace" sure sounds like the Feds are hot on somebody’s trail, doesn’t it?
  2. Isn’t it fabulous to know that Ken keeps his projects on his D: drive? If I had to lay bets, I’d say that Ken screwed up his configuration string. In fact, the whole problem smells like a failed deployment or poorly executed change. Ken probably pushed some code out late on a Friday afternoon, then boogied out of town. My prediction (totally unverifiable, of course) is that this problem will take less than 5 minutes to resolve, once Ken gets his ass back from the beach.
  3. We mere users get to see quite a bit of internal information here. Nothing really damaging, unless of course Wilson ORMapper has some security defects or something like that.
  4. Stepping back from this specific error message, we have the larger question: is it sensible to couple availability of the network to the availability of this check-the-box application? Accessing the network is the primary purpose of this whole system. It is the most critical feature. Is collecting a compulsory boolean "true" from every user really as important as the reason the whole damn thing was built in the first place? Of course not! (As an aside, this is an example of Le Chatelier’s Principle: "Complex systems tend to oppose their own proper function.")

We see this kind of operational coupling all the time. Non-critical features are allowed to damage or destroy critical features. Maybe there’s a single thread pool that services all kinds of requests, rather than reserving a separate pool for the important things. Maybe a process is overly linearized and doesn’t allow for secondary, after-the-fact processing. Or, maybe a critical and a non-critical system both share an enterprise service—producing a common-mode dependency.

Whatever the proximate cause, the underlying problem is lack of diligence in operational decoupling.