Engineering in the White Space - Wide Awake Developers

"Is software Engineering, or is it Art?"

Debate between the Artisans and the Engineers has simmered, and occasionally boiled, since the very introduction of the phrase "Software Engineering". I won't restate all the points on both sides here, since I would surely forget someone's pet argument, and also because I see no need to be redundant.

Deep in my heart, I believe that building programs is art and architecture, but not engineering.

But, what if you're not just building programs?

Programs and Systems

A "program" has a few characteristics that I'll assign here:

It accepts input.
It produces output.
It runs a sequence of instructions.
Statically, it exhibits cohesion in its executable form. [*]
Dynamically, it exhibits cohesion in its address space. [**]

* That is, the transitive closure of all code to be executed is finite, although it may not all be known in advance of execution. This allows dynamic extension via plugins, but not, for example, dynamic execution of any scripts or code found on the Web. So, a web browser is a program, but Javascript executed on some page is an independent program, not part of the browser itself.

** For "address space", feel free to substitute "object space", "process space", or "virtual memory". Cohesion requires that all the code that can access the address space should be regarded as a single program. (IPC through shared memory is a special case of an output, and should be considered more akin to a database or memory-mapped file than to part of the program's own address space.)

Suppose you have two separate scripts that each manipulate the same database. I would regard those as two separate---though not independent---programs. A single instance of Tomcat may contain several independent programs, but all the servlets in one EAR file are part of one program.

For the moment, I will not consider trivial objections, such as two distinct sets of functionality that happen to be packaged and delivered in a single EAR file. It's less interesting to me whether code does access the entire address space then whether it could. A library checkout program that includes functions for both librarians and patrons may not use common code for card number lookup, but it could. (And, arguably, it should.) That makes it one program, in my eyes.

A "System", on the other hand, consists of interdependent programs that have commonalities in their inputs and outputs. They could be arranges in a chain, a web, or a loop. No matter, if one program's input depends on another program's output, then they are part of a system.

Systems can be composed, whereas programs cannot.

Tricky White Space

Some programs run all the time, responding to intermittent inputs, these we call "servers". It is very common to see servers represented as a deceptively simple little rectangle on a diagram. Between servers, we draw little arrows to indicate communication, of some sort.

One little arrow might mean, "Synchronous request/reply using SOAP-XML over HTTP." That's quite a lot of information for one little glyph to carry. There's not usually enough room to write all that, so we label the unfortunate arrow with either "XML over HTTP"---if viewing it from an internal perspective---or "SKU Lookup"---if we have an external perspective.

That little arrow, bravely bridging the white space between programs, looks like a direct contact. It is Voyager, carrying its recorded message to parts unknown. It is Aricebo, blasting a hopeful greeting into the endless dark.

Well, not really...

These days, the white space isn't as empty as it once was. A kind of lumeniferous ether fills the void between servers on the diagram.

The Substrate

There is many a slip 'twixt cup and lip. In between points A and B on our diagram, there exist some or all of the following:

Network interface cards
Network switches
Layer 2 - 3 firewalls
Layer 7 (application) firewalls
Intrusion Detection and Prevention Systems
Message queues
Message brokers
XML transformation engines
Flat file translations
FTP servers
Polling jobs
Database "landing zone" tables
ETL scripts
Metro-area SoNET rings
MPLS gateways
Trunk lines
Oceans
Ocean liners
Phillipine fishing trawlers (see, "Underwater Cable Break")

Even in the simple cases, there will be four or five computers between program A and B, each running their own programs to handle things like packet switching, traffic analysis, routing, threat analysis, and so on.

I've seen a single arrow, running from one server to another, labelled "Fulfillment". It so happened that one server was inside my client's company while the other server was in a fulfillment house's company. That little arrow, so critical to customer satisfaction, really represented a Byzantine chain of events that resembled a game of "Mousetrap" more than a single interface. It had messages going to message brokers that appended lines to files, which were later picked up by an hourly job that would FTP the files to the "gateway" server (still inside my client's company.) The gateway server read each line from the file and constructed and XML message, which it then sent via HTTP to the fulfillment house.

It Stays Up

We analogize bridge-building as the epitome of engineering. (Side note: I live in the Twin Cities area, so we're a little leery of bridge engineering right now. Might better find another analogy, OK?) Engineering a bridge starts by examining the static and dynamic load factors that the bridge must support: traffic density, weight, wind and water forces, ice, snow, and so on.

Bridging between two programs should consider static and dynamic loads, too. Instead of just "SOAP-XML over HTTP", that one little arrow should also say, "Expect one query per HTTP request and send back one response per HTTP reply. Expect up to 100 requests per second, and deliver responses in less than 250 milliseconds 99.999% of the time."

It Falls Down

Building the right failure modes is vital. The last job of any structure is to fall down well. The same is true for programs, and for our hardy little arrow.

The interface needs to define what happens on each end when things come unglued. What if the caller sends more than 100 requests per second? Is it OK to refuse them? Should the receiver drop requests on the floor, refuse politely, or make the best effort possible?

What should the caller do when replies take more than 250 milliseconds? Should it retry the call? Should it wait until later, or assume the receiver has failed and move on without that function?

What happens when the caller sends a request with version 1.0 of the protocol and gets back a reply in version 1.1? What if it gets back some HTML instead of XML? Or an MP3 file instead of XML?

When a bridge falls down, it is shocking, horrifying, and often fatal. Computers and networks, on the other hand, fall down all the time. They always will. Therefore, it's incumbent on us to ensure that individual computers and networks fail in predictable ways. We need to know what happens to that arrow when one end disappears for a while.

In the White Space

This, then, is the essence of engineering in the white space. Decide what kind of load that arrow must support. Figure out what to do when the demand is more than it can bear. Decide what happens when the substrate beneath it falls apart, or when the duplicitous rectangle on the other end goes bonkers.

Inside the boxes, we find art.

The arrows demand engineering.