Wide Awake Developers

« December 2006 | Main | February 2007 »

How to become an "architect"

Over at The Server Side, there's a discussion about how to become an "architect".  Though TSS comments often turn into a cesspool, I couldn't resist adding my own two cents.

I should also add that the title "architect" is vastly overused.  It's tossed around like a job grade on the technical ladder: associate developer, developer, senior developer, architect.  If you talk to a consulting firm, it goes more like: senior consultant (1 - 2 years experience), architect (3 - 5 years experience), senior technical architect (5+ years experience).  Then again, I may just be too cynical.

There are several qualities that the architecture of a system should be:

  1. Shared.  All developers on the team should have more or less the same vision of the structure and shape of the overall system.
  2. Incremental.  Grand architecture projects lead only to grand failures.
  3. Adaptable. Successful architectures can be used for purposes beyond their designers' original intentions.  (Examples: Unix pipes, HTTP, Smalltalk)
  4. Visible.  The "sacred, invisible architecture" will fall into disuse and disrepair.  It will not outlive its creator's tenure or interest.

Is the designated "architect" the only one who can produce these qualities?  Certainly not.  He/she should be the steward of the system, however, leading the team toward these qualities, along with the other -ilities, of course.

Finally, I think the most important qualification of an architect should be: someone who has created more than one system and lived with it in production.  Note that automatically implies that the architect must have at least delivered systems into production.  I've run into "architects" who've never had a project actually make it into production, or if they have, they've rolled off the project---again with the consultants---just as Release 1.0 went out the door.

In other words, architects should have scars. 

Planning to Support Operations

In 2005, I was on a team doing application development for a system that would be deployed to 600 locations. About half of those locations would not have network connections. We knew right away that deploying our application would be key, particularly since it is a "rich-client" application. (What we used to call a "fat client", before they became cool again.) Deployment had to be done by store associates, not IT. It had to be safe, so that a failed deployment could be rolled back before the store opened for business the next day. We spent nearly half of an iteration setting up the installation scripts and configuration. We set our continuous build server up to create the "setup.exe" files on every build. We did hundreds of test installations in our test environment.

Operations said that our software was "the easiest installation we've ever had." Still, that wasn't the end of it. After the first update went out, we asked operations what could be done to improve the upgrade process. Over the next three releases, we made numerous improvements to the installers:

  • Make one "setup.exe" that can install either a server or a client, and have the installer itself figure out which one to do.
  • Abort the install if the application is still running. This turned out to be particularly important on the server.
  • Don't allow the user to launch the application twice. Very hard to implement in Java. We were fortunate to find an installer package that made this a check-box feature in the build configuration file!
  • Don't show a blank Windows command prompt window. (An artifact of our original .cmd scripts that were launching the application.)
  • Create separate installation discs for the two different store brands.
  • When spawning a secondary application, force it's window to the front, avoiding the appearance of a hang if the user accidentally gives focus to the original window.

These changes reduced support call volume by nearly 50%.

My point is not to brag about what a great job we did. (Though we did a great job.) To keep improving our support for operations, we deliberately set aside a portion of our team capacity each iteration. Operations had an open invitation to our iteration planning meetings, where they could prioritize and select story cards the same as our other stakeholders. In this manner, we explicitly included Operations as a stakeholder in application construction. They consistently brought us ideas and requests that we, as developers, would not have come up with.

Furthermore, we forged a strong bond with Operations. When issues arose---as they always will---we avoided all of the usual finger-pointing. We reacted as one team, instead of two disparate teams trying to avoid responsibility for the problems. I attribute that partly to the high level of professionalism in both development and operations, and partly to the strong relationship we created through the entire development cycle.

"Us" and "Them"

As a consultant, I've joined a lot of projects, usually not right when the team is forming. Over the years, I've developed a few heuristics that tell me a lot about the psychological health of the team. Who lunches together? When someone says "whole team meeting," who is invited? Listen for the "us and them" language. How inclusive is the "us" and who is relegated to "them?" These simple observations speak volumes about the perception of the development team. You can see who they consider their stakeholders, their allies, and their opponents.

Ten years ago, for example, the users were always "them." Testing and QA was always "them." Today, particularly on agile teams, testers and users often get "us" status (As an aside, this may be why startups show such great productivity in the early days. The company isn't big enough to allow "us" and "them" thinking to set in. Of course, the converse is true as well: us and them thinking in a startup might be a failure indicator to watch out for!). Watch out if an "us" suddenly becomes "them." Trouble is brewing!

Any conversation can create a "happy accident;" some understanding that obviates a requirement, avoids a potential bug, reduces cost, or improves the outcome in some other way. Conversations prevented thanks to an armed-camp mentality are opportunities lost.

One of the most persistent and perplexing "us" and "them" divisions I see is between development and operations. Maybe it's due to the high org-chart distance (OCD) between development groups and operations groups. Maybe it's because development doesn't tend to plan as far ahead as operations does. Maybe it's just due to a long-term dynamic of requests and refusals that sets each new conversation up for conflict. Whatever the cause, two groups that should absolutely be working as partners often end up in conflict, or worse, barely speaking at all.

This has serious consequences. People in the "us" tent get their requests built very quickly and accurately. People in the "them" tent get told to write specifications. Specifications have their place. Specifications are great for the fourth or fifth iteration of a well-defined process. During development, though, ideas need to be explored, not specified. If a developer has a vague idea about using the storage area network to rapidly move large volumes of data from the content management system into production, but he doesn't know how to write the request, the idea will wither on the vine.

The development-operations divide virtually ensures that applications will not be transitioned to operations as effectively as possible. Some vital bits of knowledge just don't fit into a document template. For example, developers have knowledge about the internals of the application that can help diagnose and recover from system failures. (Developer: "Oh, when you see all the request handling threads blocked inside the K2 client library, just bounce the search servers. The app will come right back." Operations: "Roger that. What's a thread?") These gaps in knowledge degrade uptime, either by extending outages or preventing operations from intervening. If the company culture is at all political, one or two incidents of downtime will be enough to start the finger-pointing between development and operations. Once that corrosive dynamic gets started, nothing short of changing the personnel or the leadership will stop it.