Wide Awake Developers


| Comments

The Agile Manifesto is explicit about it. "We value individuals and interactions over processes and tools." How should an Agile team—more specifically, an XP team—respond to the IT Infrastructure Library (ITIL), then? After all, ITIL takes seven books just to define the customizable framework for the actual practices. An IT organization usually takes at least seven more binders to define its actual processes.

Can XP and ITIL coexist in the same building, or is XP just incompatible with ITIL? In short: no.

ITIL and XP (or agile in general) are not fundamentally incompatible, but there will definitely be an interface between the XP world and the ITIL world. Whether this interface becomes an impedance barrier or not depends entirely on the way that your company chooses to implement ITIL.

I’ll run down the Service Support processes and identify some of the problems I’ve encountered. (I’m focusing on Service Support because businesses tend to implement these processes first. Few of them get far enough down the road to really attack the Service Delivery processes. It’s a shame, because I see a lot of value in the Service Delivery approach.) I will cover the service delivery processes in a future article.

Service Desk

An effective service desk can be a great asset to any team, including an XP team. Getting accurate feedback on issues your users are having can only benefit your development efforts and ultimately, the users themselves. The key here is to make sure that the service desk is well-prepared to accept responsibility for support calls on your app.

I strongly recommend that you start working with the service desk at least six weeks before your first application release. If the service desk is mature, they’ll have job aids for capturing app support needs. These will provide the minimum initial information needed for the knowledge base. The service desk personnel will augment that knowledge base over time with whatever solutions, rumors, superstitions and folk remedies they come up with. Be sure you have access to the knowledge base, so you can help weed out the "false solutions."

You also want to get on the distribution list for ticket reports from the service desk. These will tell you what issues your users are encountering. Commonly recurring or high-impact issues should become cards for consideration in your next iteration. This feeds your interface to the Problem Management process.

If the service desk is not mature, you haven’t prepared them well, or they do not perform resolution for application incidents, you will be looped in as part of the Incident Management process, below. This has some special challenges.

Incident Management

ITIL defines an "incident" as any disruption to the normal operation of a system or application.  This includes bugs, outages, and even "PEBKAC" problems.  The Incident Management process begins with notification of an incident.  This can be logged by the service desk in response to a user call.  It can even be automatically created by a monitoring system.  It ends when normal functioning of the system is restored.

Note that this does not include root cause analysis or correction!  Incident Management is all about restoring service.

Ideally, the service desk handles the entire Incident Management process and your team will not even need to be involved.  In less ideal cases, you may be called on to help resolve "novel" incidents–ones that do not have a solution in the service desk’s knowledge base.

When incidents come into the development room, you have some negative forces to deal with. By definition, the incident needs to be resolved expeditiously, making it both interrupt driven and urgent. Therefore, every incident will automatically split a pair and take somebody off their card. This is damaging to flow.

In worse cases, the entire team may get derailed and start huddling around the incident. Fire-fighting is exciting quadrant I work. It’s natural to get a rush from being the hero. The problem is obvious, though.  If the entire team is chasing the incident, nobody is making forward progress on the iteration. If you have a large user community or a lot of incidents, you can lose an entire day—or an entire iteration—before you realize it.

This can be exacerbated if your service desk never resolves application support incidents. In such cases, I recommend the "Designated Sacrifice" pattern. Assign one member of the team to handle the "Bat-Phone" calls and be the primary point of contact for incident resolution. This is a crappy job—you get pulled away constantly, can’t maintain focus, get almost no card work done—so you’ll want to rotate that position frequently. (On the other hand, there is that hero factor that provides some consolation.) Even doing it for one full iteration can be very demoralizing.

Problem Management

Recurring incidents can be identified as Problems that require correction. This is the job of the Problem Management process.

Identifying a Problem is often done by the service desk, but it can also come from other quarters. The decision about which Problems require correction often becomes very slow and bureaucratic. This is a process you want to work with very closely. Problem Management typically tolerates a much higher level or outstanding defects than an XP team wants to allow. I’ve seen teams get chewed out for fixing Problems that weren’t scheduled to be addressed for a couple of iterations! Imagine how surreal that meeting feels!

Problem managers should be encouraged to write cards. Your team should even reserve a fraction of your velocity in each iteration just to handle Problems. You also need to communicate back to the problem managers when Problem cards are completed. Really good Problem Management identifies a few problem states such as "known problem", "known workaround", and "known solution". An XP team will typically move through these states pretty quickly.

Bear in mind that the ITIL definition of Problem Management is all about oversight, not the actual changes needed to fix the problem.  The actual changes are deployed as part of Release Management.

Change Management

No part of ITIL gives more people cold sweats than Change Management.  This is the process that so easily slips into heavyweight bureaucracy or, worse, meaningless CAB meetings.

Change Management as defined simply means tracking changes, their impact to configuration items, and ensuring that changes are applied in an orderly way.  It doesn’t have to hurt.

In reality, however, XP teams will spend a lot of time preparing for change advisory board meetings. Beware: the XP team may get a bad reputation for creating "too much" change.

I recommend standardizing your change and deployment process. Get into a regular rhythm of releases and deployments so the CAB just knows to expect that every third Tuesday (or whenever), your team will have a deployment. Standardize the deployment mechanics and system impact statement so you can templatize and re-use your change requests. Familiarity will create confidence with the CAB. Constantly showing them change requests they’ve never seen before will raise their level of scrutiny.

Failed changes also trigger more scrutiny. Your XP team will have an advantage here, because your rigorous approach to automated testing will reduce the incidence of failed changes, right?

Configuration Management

Configuration Management is *not* the act of changing configuration items. It’s the process for tracking planned, executed, and retired configurations. As you plan each release, you should identify the CIs that will be affected by the release.

In a well-executed ITIL rollout, configuration management is vital for change management, incident management, the service desk, and release management. In a poorly-executed ITIL rollout, configuration management doesn’t exist, or it only addresses servers or network devices.

CM should cover servers, network topology, applications, business processes, documentation, and the dependencies among all of them. That way, proposed changes to one CI (e.g., upgrade to front-end firewalls) can be analyzed for its impact. This is CM nirvana, seldom achieved.

The XP team should have an advantage here again, because you’ve already broken story cards down to tasks at the beginning of an iteration. That means you already know which applications and servers will be changed in that iteration. Roll up a few iterations into a release, and the CIs affected by the release should be well known.

On the other hand, if you’ve taken XP to its "no documentation" extreme, then you will not have tracked the CIs touched by each iteration. This underscores a common misinterpretation of XP; it doesn’t eschew all documentation, just the documentation that doesn’t add value from the customer’s perspective. So, does tracking changes against CIs add value from the customer’s perspective? Not directly, no. There is an indirect benefit, in that the customer will receive better uptime and performance, but that may seem remote to the team. The best I can say is that this is one place where you’ll have to chalk it up to "necessary overhead".

Release Management

This is an easy one to integrate with your XP team. Release Management dovetails quite naturally with XP’s release planning cycle. Engage early, though, because the ITIL process will likely require longer lead times than your team is used to.

Heads Down

| Comments

I’ve been quiet lately for a couple of reasons.

First, I’m thrilled to say that I’m joining the No Fluff, Just Stuff stable of speakers.  It’s an honor and a pleasure to be invited to keep such company.  The flip side is, I’m spending a lot of my free time polishing up my inventory of presentations.  More frankly, I’m rebuilding them all with Keynote.  (Brief aside, I’m coming to love Keynote.  It has some flaws and annoyances, but the result is worth it!)

I’ll debut the first of these new presentations at OTUG on May 15th.  I’ll be speaking about "Design for Operations".  The talk will be about 70% from the last part of Release It, and about 30% original content.  OTUG will be giving away a couple of copies of my book, but you have to be there to win!

Finally, I’m working on an article about performance and capacity management.  Most capacity planning work is done entirely within Operations, without much involvement from Development.  At the same time, most developers don’t have a visceral appreciation for how dramatically the application’s efficiency can affect the system’s overall profitability.

This article will show the relationship between application response time, system capacity, and financial success.  I’m hoping to include a simulator app for download that you can use to play with different scenarios to see what a dramatic difference 100ms can make. 

Coach and Team From Same Firm

| Comments

Is it an antipattern to have a consulting firm provide both the coach and developers?  By providing the developers, the firm is motivated to deliver on the project, with coaching as an adjunct.  If, instead, the firm provides just the coach, it will be judged by how well the client adopts the process.  These two motives can easily conflict.

Case in point: at a previous client of mine, my employer was charged with completing the project, using a 50-50 mix of contractors and client developers.  My employer, a consulting firm, provided several developers experienced with XP and Scrum, as well as an agile coach.  The firm was thus charged with two imperatives: first, deliver the project; second, introduce agile methods within the client. 

With project success as a requirement, the firm decided to intereview the developers at the outset of the project. The client’s developers (rightly) perceived that they were interviewing for their own jobs.  This started a negative dynamic that ultimately resulted in 80% attrition among the client’s developers.

On a pure coaching engagement, the coach would probably have "made do" with whomever the client provided. 

We delivered all the features, basically on time, with very high quality. Financially speaking, it was a success, generating more orders and more revenue per order than its predecessor.  It is harder to say that the engagement as a whole was a success, though.  Almost all of the developers were contractors, so the client got their product, but very little adoption of agile methods.

Perhaps if the coach and the contract developers had come from different firms, the motivations would not have been as tangled, and more of the client’s valuable people would have stayed.  The team might not have suffered from the strained, unhealthy environment from the early days of the project.

Then again, perhaps not.  The client may have been expecting that level of attrition. Maybe that’s just to be expected when you trying to bring a random selection of corporate developers over to agile methods, especially if the methods are decreed from above instead of brought upward by grass-roots. Maybe the dynamic would have existed even with a coach that was totally disinterested in the project outcome.

Moving Your Home Directory on Leopard

| Comments

Since NetInfo Manager is going away under Leopard, we’ve got a gap in capability. How do you relocate your home directory without the GUI?

There are a few reasons you might want to move your home directory to another volume. For example, you might reinstall your OS frequently. Or, perhaps you just want to keep your data on a bigger disk than the one that came in the machine. In my case, both.

The venerable NetInfo is being replaced entirely with Directory Services. (Try "man 8 DirectoryServices" for more information.) There’s a handy command-line tool you can use to interact with the DirectoryServices.

Let’s start by opening up a Terminal window. (Applications > Utilities > Terminal) At first, you’ll be logged in as yourself, not as root.

Last login: Wed Dec 31 18:00:00 on ttyp0
donk:~ mtnygard$ 

The first thing is to get out of your home directory, because we’re going to delete it in about a minute and a half. Change to the root directory and make yourself into the root user with "sudo".

Last login: Wed Dec 31 18:00:00 on ttyp0
donk:~ mtnygard$ sudo su -
donk:~ root#

Next, fire up "dscl", the directory services command line. Without arguments, this gives you an interactive, shell-like environment to explore the directory. It also spews a bunch of help messages. If you give it "localhost", then it quietly assumes you wanted to interact with the directory.

You can list entries, cd around the directory hierarchy, and even create entries or change attributes.

User information is stored under /Local/Users, so we’ll cd to that now.

donk:~ root# dscl localhost
 > cd /Local/Users
/Local/Users >

Now, running "ls" will show you all the users that your machine knows.  Try it now.

donk:~ root# dscl localhost
 > cd /Local/Users
/Local/Users > ls
/Local/Users >

Holy crap!  Who the hell are all these people?

Well, of course, they aren’t people.  All the usernames starting with an underscore are application IDs.  Root, nobody, and daemon are all part of the OS.  Once you eliminate them, there should just be the people you’ve actually created accounts for.  If you see any names you don’t recognize at this point, this would be a good time to shut off your network connection.

At this point, you could "cd" directly into the entry for your user.  It won’t show you anything special; users do not have subnodes in the directory.  It would set up your context for future commands, limiting them to just that user.  In this case, however, we’ll stay at /Local/Users and run "cat" on my username.

/Local/Users > cat mtnygard
dsAttrTypeNative:_writers_hint: mtnygard
dsAttrTypeNative:_writers_jpegphoto: mtnygard
dsAttrTypeNative:_writers_passwd: mtnygard
dsAttrTypeNative:_writers_picture: mtnygard
dsAttrTypeNative:_writers_realname: mtnygard
dsAttrTypeNative:authentication_authority: ;ShadowHash;
dsAttrTypeNative:generateduid: 7F6A8EDE-63EC-4A34-9391-031A9C77806D
dsAttrTypeNative:gid: 501
dsAttrTypeNative:home: /Users/mtnygard
 ffd8ffe0 00104a46 49460001 01000001 00010000 ffdb0043 00020202 ... 7fffd9
dsAttrTypeNative:name: mtnygard
dsAttrTypeNative:passwd: ********
 /Library/User Pictures/Sports/Tennis.tif
 Michael Nygard
dsAttrTypeNative:shell: /bin/bash
dsAttrTypeNative:uid: 501
AppleMetaNodeLocation: /Local/Default
AuthenticationAuthority: ;ShadowHash;
GeneratedUID: 7F6A8EDE-63EC-4A34-9391-031A9C77806D
 ffd8ffe0 00104a46 49460001 01000001 00010000 ffdb0043 00020202 ... 7fffd9
NFSHomeDirectory: /Users/mtnygard
Password: ********
 /Library/User Pictures/Sports/Tennis.tif
PrimaryGroupID: 501
 Michael Nygard
RecordName: mtnygard
RecordType: dsRecTypeStandard:Users
UniqueID: 501
UserShell: /bin/bash
/Local/Users >

Hmm. Seems like it must mean something. This is listing the values of all the attributes of my user profile. It’s what I want, but there’s a big pile of noise in the middle. That noise is a textual representation of my profile’s JPEG. (I’ve edited it out of this transcript.) If you scroll up past that, you’ll see the attribute of real interest.

The property dsAttrTypeNative:home tells the OS where to find my home directory.

I can change it with dscl’s "change" command. The format of change is a little strange because it has to deal with multi-valued properties (as do all of the directory services commands.)

/Local/Users > change mtnygard dsAttrTypeNative:home /Users/mtnygard /Volumes/Data/mtnygard
/Local/Users >

The first parameter is the object to change, the second parameter is the attribute to change. The third parameter is the old value that you want to replace (multi-valued list for each attribute, remember.) Finally, the fourth parameter is the new value you want to set.


Not quite done yet, though. I’ve given the OS a bogus home directory. There’s no such directory as /Volumes/Data/mtnygard yet.

To get there, I have to move my directory from under /Users to the new location. I have to do this as root, but I don’t want root to end up owning all my personal stuff. Fortunately, there’s a "cp" option for that.

donk:~ # cp -Rp /Users/mtnygard /Volumes/Data/

Now, we’re almost, almost done. Log off and log back on into your roomy new home directory.


  1. I don’t know how to do this if you’ve got a shared directory tree set up.  You might have that if you’re on a Mac network at work, for example.  You should definitely try this at home.
  2. The "cp" command I use will do really funky things if you’ve got hard links, symlinks, or especially circular symlinks in your home directory.  Then again, if you’ve done that to yourself, you probably know enough Unix to work out your own parameters for "cp", "tar", "mv" or "cpio".
  3. One more thing: I’m not sure if this is from running a developer seed of Leopard, or if it’s due to this home directory move technique, but I keep running into permissions problems. I couldn’t automatically install dashboard widgets, for example. Adium complained that it couldn’t create its "sounds" directory.

What Makes a POJO So Great, Anyway?

| Comments

My friend David Hussman once said to me, "The next person that says the word ‘POJO’ to me is going to get stabbed in the eye with a pen."  At the time, I just commiserated about people who follow crowds rather than making their own decisions.

David’s not a violent person.  He’s not prone to fits of violence or even hyperbole.  What made this otherwise level-headed coach and guru resort to non-approved uses of a Bic?

This weekend in No Fluff, Just Stuff, I had occasion to contemplate POJOs again.  There were many presentations about "me too" web frameworks.  These are the latest crop of Java web frameworks that are furiously copying Ruby on Rails features as fast as they can.  These invariably make a big deal out of using POJOs for data-mapped entities or for the beans accessed by whatever flavor of page template they use. (See JSF, Seam, WebFlow, Grails, and Tapestry 5 for examples.)

Mainly, I think the infuriating bit is the use of the word "POJO" as if it’s a synonym for "good".  There’s nothing inherently virtuous about plain old Java objects.  It’s a retronym; a name made up for an old thing to distinguish it from the inferior new replacement.

People only care about POJOs because EJB2 was so unbelievably bad.

Nobody gives a crap about "POROs" (Plain old Ruby objects) because ActiveRecord doesn’t suck.

Flash Mobs and TCP/IP Connections

| Comments

In Release It, I talk about users and the harm they do to our systems.  One of the toughest types of user to deal with is the flash mob.  A flash mob often results from Attacks of Self-Denial, like when you suddenly offer a $3000 laptop for $300 by mistake.

When a flash mob starts to arrive, you will suddenly see a surge of TCP/IP connection requests at your load-distribution layer.  If the mob arrives slowly enough (less than 1,000 connections per second) then the app servers will be hurt the most.  For a really fast mob, like when your site hits the top spot on digg.com, you can get way more than 1,000 connections per second.  This puts the hurt on your web servers.

As the TCP/IP connection requests arrive, the OS queues them for servicing by the application.  As the application gets around to calling "accept" on the server socket, the server’s TCP/IP stack sends back the SYN/ACK packet and the connection is established.  (There’s a third step, but we can skip it for the moment.)  At that point, the server hands the established connection off to a worker thread to process the request.  Meanwhile, the thread that accepted the connection goes back to accept the next one.

Well, when a flash mob arrives, the connection requests arrive faster than the application can accept and dispatch them.   The TCP/IP stack protects itself by limiting the number of pending connection requests, so if the requests arrive faster than the application can accept them, the queue will grow until the stack has to start refusing connection requests.  At that point, your server will be returning intermittent errors and you’re already failing.

The solution is much easier said than done: accept and dispatch connections faster than they arrive.

Filip Hanik compares some popular open-source servlet containers to see how well they stand up to floods of connection requests.  In particular, he demonstrates the value of Tomcat 6’s new NIO connector.  Thanks to some very careful coding, this connector can accept 4,000 connections in 4 seconds on one server.  Ultimately, he gets it to accept 16,000 concurrent connections on a single server.  (Not surprisingly, RAM becomes the limiting factor.)

It’s not clear that these connections can actually be serviced at that point, but that’s a story for another day.

Release It! Is Released!

| Comments

"Release It!" has been officially announced in this press release.  Andy Hunt, my editor, also posted announcements to several mailing lists.

It’s been a long road, so I’m thrilled to see this release.

When you release a new software system, that’s not the end of the process, but just the beginning of the system’s life.  It is the same thing here.  Though it’s taken me two years to get this book done and on the market, this is not the end of the book’s creation, but the beginning of it’s life.


Self-Inflicted Wounds

| Comments

My friend and colleague Paul Lord said, "Good marketing can kill you at any time."

He was describing a failure mode that I discuss in Release It!: Design and Deploy Production-Ready Software as "Attacks of Self-Denial".  These have all the characteristics of a distributed denial-of-service attack (DDoS), except that a company asks for it.  No, I’m not blaming the victim for electronic vandalism… I mean, they actually ask for the attack.

The anti-pattern goes something like this: marketing conceives of a brilliant promotion, which they send to 10,000 customers.  Some of those 10,000 pass the offer along to their friends.  Some of them post it to sites like FatWallet or TechBargains.  On the appointed day, hour, and minute, the site has a date with destiny as a million or more potential customers hit the deep link that marketing sent around in the email.  You know, the one that bypasses the content distribution network, embeds a session ID in the URL, and uses SSL?

Nearly every retailer I know has done this to themselves at one point.  Two holidays ago, one of my clients did it to themselves, when they announced that XBox 360 preorders would begin at a certain day and time.  Between actual customers and the amateur shop-bots that the tech-savvy segment cobbled together, the site got crushed.  (Yes, this was one where marketing sent the deep link that bypassed all the caching and bot-traps.)

Last holiday, Amazon did it to themselves when they discounted the XBox 360 by $300.  (What is it about the XBox 360?)  They offered a thousand units at the discounted price and got ten million shoppers.  All of Amazon was inaccessible for at least 20 minutes.  (It may not sound like much, but some estimates say Amazon generates $1,000,000 per hour during the holiday season, so that 20 minute outage probably cost them around $200,000!)

In Release It!, I discuss some non-technical ways to mitigate this behavior, as well as some design and architecture patterns you can apply to minimize damage when one of these Attacks of Self-Denial occur.