Wide Awake Developers


Postmodern Programming

It's taken me a while to get to this talk. Not because it was uninteresting, just because it sent my mind in so many directions that I needed time to collect my scattered thoughts.

Objects and Lego Blocks 

On Thursday, James Noble delivered a Keynote about "The Lego Hypothesis". As you might guess, he was talking about the dream of building software as easily as a child assembles a house from Lego bricks. He described it as an old dream, using quotes from the very first conference on Software Engineering... the one where they utterly invented the term "Software Engineering" itself.  In 1968.

The Lego Hypothesis goes something like this: "In the future, software engineering will be set free from the mundane necessity of programming." To realize this dream, we should look at the characteristics of Lego bricks and see if software at all mirrors those characteristics.

Noble ascribed the following characteristics to components:

  • Small
  • Indivisible
  • Substitutable
  • More similar than different
  • Abstract encapsulations
  • Coupled to a few, close neighbors
  • No action at a distance

(These actually predate that 1968 software engineering conference by quite a bit. They were first described by the Greek philosopher Democritus in his theory of atomos.)

The first several characteristics sound a lot like the way we understand objects. The last two are problematic, though.

Examining many different programs and languages, Noble's research group has found that objects are typically not connected to just a few nearby objects. The majority of objects are coupled to just one or two others. But the extremal cases are very, very extreme. In a Self program, one object had over 10,000,000 inbound references. That is, it was coupled to more than 10,000,000 other objects in the system. (It's probably 'nil', 'true', 'false', or perhaps the integer object 'zero'.)

In fact, object graphs tend to form scale-free networks that can be described by power laws.

Lots of other systems in our world form scale-free networks with power law distributions:

  • City sizes
  • Earthquake magnitudes
  • Branches in a roadway network
  • The Internet
  • Blood vessels
  • Galaxy sizes
  • Impact crater diameters
  • Income distributions
  • Books sales

One of the first things to note about power law distributions is that they are not normal. That is, words like "average" and "median" are very misleading. If the average inbound coupling is 1.2, but the maximum is 10,000,000, how much does the average tell you about the large scale behavior of the system?

(An aside: this is the fundamental problem that makes random events so problematic in Nassim Taleb's book The Black Swan. Benoit Mandelbrot also considers this in The (Mis)Behavior of Markets. Yes, that Mandelbrot.)

Noble made a pretty good case that the Lego Hypothesis is dead as disco. Then came a leap of logic that I must have missed.


"The ultimate goal of computer science is the program."

You are assigned to write a program to calculate the first 100 prime numbers. If you are a student, you have to write this as if it exists in a vacuum. That is, you code as if this is the first program in the universe. It isn't. Once you leave the unique environs of school, you're not likely to sit down with pad of lined paper and a mechanical pencil to derive your own prime-number-finding algorithm. Instead, your first stop is probably Google.

Searching for "prime number sieve" currently gives me about 644,000 results in three-tenths of a second. The results include implementations in JavaScript, Java, C, C++, FORTRAN, PHP, and many others. In fact, if I really need prime numbers rather than a program to find numbers, I can just parasitize somebody else's computing power with online prime number generators.

Noble quotes Steven Conner from the Cambridge Companion to Postmodernism:

"...that condition in which, for the first time, and as a result of technologies which allow the large-scale storage, access, and re-production of records of the past, the past appears to be included in the present."

In art and literature, postmodernism incorporates elements of past works, directly and by reference. In programming, it means that every program ever written is still alive. They are "alive" in the sense that even dead hardware can be emulated. Papers from the dawn of computing are available online. There are execution environments for COBOL that run in Java Virtual Machines, possibly on virtual operating systems. Today's systems can completely contain every previous language, program, and execution environment.

I'm now writing well beyond my actual understanding of postmodern critical theory and trying to report what Noble was talking about in his keynote.

The same technological changes that caused the rise of postmodernism in art, film, and literature are now in full force in programming. In a very real sense, we did it to ourselves! We technologists and programmers created the technology---globe-spanning networks, high compression codecs, indexing and retrieval, collaborative filtering, virtualization, emulation---that are now reshaping our profession.

In the age of postmodern programming, there are no longer "correct algorithms". Instead, there are contextual decisions, negotiations, and contingencies. Instead of The Solution, we have individual solutions that solve problems in a context. This should sound familiar to anyone in the patterns movement.

Indeed, he directly references patterns and eXtreme Programming as postmodern programming phenomena, along with "scrap-heap" programming, mashups, glue programming, and scripting languages.

I searched for a great way to wrap this piece up, but ultimately it seemed more appropriate to talk about the contextual impact it had on me. I've never been fond of postmodernism; it always seemed simultaneously precious and pretentious. Now, I'll be giving that movement more attention. Second, I've always thought of mashups as sort of tawdry and sordid---not real programming, you know? I'll be reconsidering that position as well. 

Reflexivity and Introspection

A fascinating niche of programming languages consists of those languages which are constructed in themselves. For instance, Squeak is a Smalltalk whose interpreter is written in Squeak. Likewise, the best language for writing a LISP interpreter turns out to be LISP itself. (That one is more like nesting than bootstrapping, but it's closely related.)

I think Ruby has enough introspection to be built the same way. Recently, a friend clued me in to PyPy, a Python interpreter written in Python.

I'm sure there are many others. In fact the venerable GCC is written in its own flavor of C. Compiling GCC from scratch requires a bootstrapping phase, by compiling a small version of GCC, written in a more portable form of C, with some other C compiler. Then, the phase I micro-GCC compiles the whole GCC for the target platform.

Reflexivity arises when the language has sufficient introspective capabilities to describe itself. I cannot help but be reminded of Godel, Escher, Bach and the difficulties that reflexivity cause. Godel's Theorem doesn't really kick in until a formal system is complex enough to describe itself. At that point, Godel's Theorem proves that there will be true statements, expressed in the language of the formal system, that cannot be proven true. These are inevitably statements about themselves---the symbolic logic form of, "This sentence is false."

Long-time LISP programmers create works with such economy of expression that we can only use artistic metaphors to describe them. Minimalist. Elegant. Spare. Rococo.

Forth was my first introduction to self-creating languages. FORTH starts with a tiny kernel (small enough that it fit into a 3KB cartridge for my VIC-20) that gets extended one "word" at a time. Each word adds to the vocabulary, essentially customizing the language to solve a particular problem. It's really true that in FORTH, you don't write programs to solve problems. Instead, you invent a language in which solving the problem is trivial, then you spend your time implementing that language.

Another common aspect of these self-describing languages seems to be that they never become widely popular. I've heard several theories that attempted to explain this. One says that individual LISP programmers are so productive that they never need large teams. Hence, cross-pollination is limited and it is hard to demonstrate enough commercial demand to seem convincing. Put another way: if your started with equal populations of Java and LISP programmers, demand for Java programmers would quickly outstrip demand for LISP programmers... not because it's a superior language, but just because you need more Java programmers for any given task. This demand becomes self-reinforcing, as commercial programmers go where the demand is, and companies demand what they see is available.

I also think there's a particular mindset that admires and relates to the dynamic of the self-creating language. I suspect that programmers possessing that mindset are also the ones who get excited by metaprogramming.

Expressiveness, revisited

I previously mused about the expressiveness of Ruby compared to Java. Dion Stewart pointed me toward F-Script, an interpreted, Smalltalk-like scripting language for Mac OS X and Cocoa. In F-Script, invoking a method on every object in an array is built-in syntax. Assuming that updates is an array containing objects that understand the preProcess and postProcess messages.

updates preProcess
updates postProcess

That's it. Iterating over the elements of the collection is automatic.

F-Script admits much more sophisticated array processing; multilevel iteration, row-major processing, column-major processing, inner products, outer products, "compression" and "reduction" operations. The most amazing thing is how natural the idioms look, thanks to their clean syntax and the dynamic nature of the language.

It reminds me of a remark about General Relativity, that economy of expression allowed vast truths to be stated in one simple, compact equation. It would, however, require fourteen years of study to understand the notation used to write the equation, and that one could spend a lifetime understanding the implications.

Technorati Tags: java, beyondjava, ruby, fscript

Ruby expressiveness and repeating yourself

Just this week, I was reminded again of how Java forces you to repeat yourself. I had an object that contains a sequence of "things to be processed". The sequence has to be traversed twice, once before an extended process runs and once afterwards.

The usual Java idiom looks like this:

public void preProcess(ActionContext context) {
  for (Iterator iter = updates.iterator(); iter.hasNext(); ) {
    TwoPhaseUpdate update = (TwoPhaseUpdate) iter.next();

public void postProcess(ActionContext context) {
  for (Iterator iter = updates.iterator(); iter.hasNext(); ) {
    TwoPhaseUpdate update = (TwoPhaseUpdate) iter.next();

Notice that there are only two symbols different between these two methods, out of 20 semantically significant symbols. According to the Pragmatic Programmers, even iterating over the collection counts as a kind of repetition (and therefore a violation of DRY - don't repeat yourself.)

The Ruby equivalent would be something like:

def preProcess(context)
   updates.each { |u| u.preProcess(context) }

def postProcess(context)
   updates.each { |u| u.postProcess(context) }

Now, there are two differening symbols out of 10 (20% variance instead of 10%). There's been no loss of expressiveness, in fact, the main intention of the code is clearer in the Ruby version than in the Java version.

Can we make the variance higher? Perhaps.

def preProcess(context)
   each_update(:preProcess, context)

def postProcess(context)
   each_update(:postProcess, context)

def each_update(method, context)
   updates.each { |u| u.send(method, context) }

Now the two primary methods have 2 symbols out of 7 different or nearly 28%. The expressiveness is damaged a little bit by the dynamic dispatch via "send". It would be unthinkable to use reflection in Java to make the code clearer. (Anyone who's worked with reflection knows what I mean.) Here, it's not unthinkable, but it might just not help clarity.

Technorati Tags: java, beyondjava, ruby

Programmer productivity measurements don't work.

Programmer productivity measurements don't work.

The most common metric was discredited decades ago, but continues to be used: KLOC. Only slightly better is function points. At least it's tied to some deliverable value. Still, the best function point is the one you don't have to develop. Likewise, the best line of code is the one you don't need to write. In fact, sometimes my most productive days are the ones in which I delete the most code. Why are these metrics so misleading?

Because they are counting inventory as an asset. Lines of code are inventory. Function points are inventory. Any metric that only measures the rate of inventory production is fatally flawed. We need metrics that measure throughput instead.

Technorati Tags: lean, agile

More Beanshell Goodness

Thanks to the clean layered architecture in our application, we've got a very clear interface between the user interface (just Swing widgets) and the "UI Model". In the canonical MVC mode, our UI Model is part controller and part model. It isn't the domain model, however. It's a model of the user interface. It has concepts like "form" and "command". A "form" is mainly a collection of property objects that are named and typed. The UI interacts with the rest of the application by binding to the properties.

The upshot is that anything the UI can do by setting and getting properties (including executing commands via CommandProperty objects) can be done through test fixtures or automated interfaces. Enter beanshell.

After integrating beanshell, all of our forms and properties were immediately available. Today, I worked with one of my teammates to build a beanshell script to drive through the application. It creates a customer and goes through the entire workflow. Run the script a million times or so, and you've got a great pile of test data. Schema changes? Domain model changes? No problem. Just re-run the script (and wait an hour or so) and you've got updated test data.

Technorati Tags: java

Smalltalk style prototyping for Java?

I've been eyeing Beanshell for some time now. It's a very straightforward scripting language for Java. Its syntax is about what you would expect if I said, "Java with optional types and no need to pre-declare variables." So, a Java programmer probably needs all of about thirty seconds to understand the language.

What I didn't expect was how quickly I could integrate it into my applications. Here's an example. I've got a Swing desktop application for which I wanted to add a small command shell pane. I spent about an hour working with Swing's JTextArea and rolling my own parser. It was late at night and I was short on bright ideas. Finally, about the time I realized I was going to need variables and flow control, I pulled the emergency stop cord and backed up.

After downloading the full beanshell JAR file (about 280K) and adding it to my build path, all I had to do was this:

JConsole console = new JConsole();   
frame.getContentPane().add(new JScrollPane(console), BorderLayout.SOUTH);  
Interpreter interpreter = new Interpreter(console);   
new Thread(interpreter).start(); 

Those four lines of code are literally all that is needed to get a beanshell prompt running inside of your application. From the prompt, you have access to every class and method within your application. If you've got some kind of object registry or namespace, or any kind of Singletons, then you'll also have access to real object instances.

Beanshell has a built-in command called "desktop()". The one line will launch a Smalltalk-style IDE, with class browser and interpreter window. This desktop is still part of your application's JVM. It lacks most of the power of Smalltalk's library, which evolved together with the workspace to support highly dynamic programming. Nevertheless, the beanshell desktop retains the immediacy of working in Smalltalk.


  • Beanshell
  • Squeak - a Free, modern Smalltalk
  • Java Object Inspector - A simple Swing-based inspector, can be launched on any Java object from a Beanshell prompt. A great complement to Beanshell

Technorati Tags: agile, java

I think I'd like to

I think I'd like to do some Smalltalk (or Squeak) development sometime. Just for myself. It would be good for me -- like an artist going to a retreat and setting aside all notions of practicality. I know I'll never work in Squeak professionally. That's why it would be like saying to yourself, "In this now, purity of expression is all that matters. Tomorrow, I will worry about making something I can sell. Tomorrow I will design so the mediocre masses that follow me cannot corrupt it. Today, I will work for the joy I find in the work."

The burdens of responsibility leave no room for such indulgence. So I turn back to Java and C#. I'll write another Address class and deal with another session manager, and more cookies. Always with the cookies.


For the ultimate in temporal, architectural, language and spatial decoupling, try two of my favorite fluid technologies: publish-subscribe messaging and tuple-spaces.

Designing for Emergent Behavior

Lately, I've been grooving on emergent behavior. This fuzzy term comes from the equally fuzzy field of complexity studies. Mix complex rules together with non-linear effects (like humans) and you are likely to observe emergent behavior.

Recent example: web browser security holes. Any program inherently constitutes a complex system. Add in some dynamic reprogramming, downloadable code, system-level scripting, and millions upon millions of users and you've got a perfect petri dish. Sit back and watch the show. Unpredictable behavior will surely result.

In fact, "emergent" sometimes gets used as a synonym for "unpredictable". By and large, I believe that's true. In traditional systems design, "unpredictable" definitely equals "sloppy". Command-and-control, baby. Emergent behavior is what happens when your program goes off the rails.

The thing is, emergent behavior is where all the really interesting things happen. Predictable programs are boring. Big batch runs are predictable.

But, you have to consider the complete system. In a big batch run, the system is linear: inputs, transformation, outputs. No feedback. No humans. When you include humans in your view of the system, all these messy feedback loops start to appear. It gets even worse when you have multiple humans connected via the programs. Feedback loops that stretch from one person, through at least two programs, out to another person and back.

Any system that involves humans will exhibit emergent behaviors -- and this is a very good thing.

Are "designed" behavior and "emergent" behavior inherently incompatible? I don't think so. I think it may be possible to design for emergent behavior. I mean that certain designs will encourage some kinds of emergent behavior, whereas other designs encourage other kinds of emergent behavior. We can study the behaviors produced by various systems and designs to build a compendium of factors that are likely to facilitate one class of behavior or another.

For example: In every corporation, I see large volumes of data stored and shared in two different formats. The nature of the two systems encourages very different behaviors.

First we have relational databases. These tend to be large, expensive systems. As a result, they are centralized to one degree or another. The nature of relational algebra is that of a static schema. Therefore, changes are rigidly controlled. Centralized, rigidly controlled assets require guardians (DBAs) and gatekeepers (data modelers). Because the schema is well-defined and changes slowly, the database gains a degree of transparency. Applications are integrated through their databases. Generic tools for backup, reporting, extraction, and modeling become possible. The data can be accessed from a variety of applications in a relatively generic fashion.

The other data storage tool I see used widely is the spreadsheet. I almost never see a spreadsheet used to calculate numbers. Instead, most are used as a schema-less data storage tool. Often created directly by the business analysts, these spreadsheets are very conducive to change. Sharing is as simple as sending the file through email. Of course, this leads to version conflicts and concurrent update issues that have to be settled by hand (usually by printing a timestamp on the hardcopies!) There is not a central definition of the data structure. Indeed, neither the data nor the structures from spreadsheets can be reused. A spreadsheet makes the 2-dimensional structure of a table obvious, but it makes relationships difficult, if not impossible, to represent. Ergo, spreadsheet users don't do relationships. Access to the spreadsheets is always mediated by a single application.

So, two different systems. Both store structured (or at least semi-structured) data. The nature of each produces very different emergent behaviors. In one case, we find the evolution of acolytes of the RDBMS. In the other case, we find that a numeric analysis tool is being used for widespread data storage and sharing.

Given enough examples, enough time, and enough study, can we not learn to extrapolate from the essential nature of our designs to the most probable emergent behaviors? Even perhaps, to select the emergent behaviors that we desire first, and, starting from those, decide what essential nature our designs must embody to most likely to encourage those behaviors?

Names have Power

Names have power. Shamanic primitives guard their true names -- give me your name and you give me power over you. In the ether, your name is your only identity. Give away your name and you give away yourself. No cause, issue, or crusade has a follower until it has a name. A good name evokes images, emotions. A well-named issue becomes uncontestable. (Who is really opposed to "family values", anyway?)

Naming things well may be one of the hardest jobs in design. Somebody once said that object-oriented design was about creating the language that you would use to solve the problem. Start with the language (a collection of names, and rules about how to assemble the names), then deal with the problem.

I'm struggling with naming something right now. I can sense what it is. There is a real thing there. I can feel it. I need to define it, give it boundaries. When I can name it, I will give it life.

Find the line, find the shape
Through the grain
Find the outline, things will
Tell you their name
--Susanne Vega

The best name I've come up with yet is fluid. There are fluid methods, fluid tools, fluid technologies, fluid designs, and so on. Things that are fluid welcome change. They adapt. They are pleasant to modify. If I have a fluid architecture, then integrating a new system into the mix does not cause massive headaches and heartburn. (Hmmm. So dropping a new system into a fluid architecture doesn't cause a ripple effect? Right. See how hard it is to name things?) Fluid "stuff" does not resist change. Being fluid means nothing is ever carved in stone. Things that are fluid encourage certian emergent properties that we value: fast, flexible, joyous.

Pah. That's damn close to gibberish.

Let's try analogy and contrast:

FluidNot fluid
Publish-subscribe messagingFlat file integration
Typeless languagesStrongly-typed languages
Tuple-spacesRelational databases
eXtreme ProgrammingSEI CMM Level 5
Cross-functional teamsSilos
Whiteboard task listsGANTT charts
20-person startupThe same company at 150 people

Does that help? The items on the left share some essential, underlying attributes. The things on the right lack those attributes; they embody different values. (I don't like the semiotics of "fluid". Call that a working title, not a true name. Besides, the natural opposites of "fluid" would be "solid" or "concrete". These are both positively-connoted terms.)

So what can I name this quality? Is there really something essential there, or is this just reflecting nothing more than the way I like to work?