Wide Awake Developers

« March 2008 | Main | May 2008 »

Agile IT! Experience

On June 26-28, 2008, I'll be speaking at the inagural Agile IT! Experience symposium in Reston, VA. Agile ITX is about consistently delivering better software. It's for development teams and management, working and learning together.

It's a production of the No Fluff, Just Stuff symposium series.  Like all NFJS events, attendance is capped, so be sure to register early.

From the announcement email:

The central theme of the Agile ITX conference (www.agileitx.com) is to help your development team/management consistently deliver better software. We'll focus on the entire software development life cycle, from requirements management to test automation to software process. You'll learn how to Develop in Iterations, Collborate with Customers, and Respond to Change. Software is a difficult field with high rates of failure. Our world-class speakers will help you implement best practices, deal with persistent problems, and recognize opportunities to improve your existing practices.

Dates: June 26-28, 2008

Location: Sheraton Reston

Attendance: Developers/ Technical Management

Sessions at Agile ITX will cover topics such as:

  • Continuous Integration (CI)
  • Test Driven Development (TDD)
  • Testing Strategies, Team Building
  • Agile Architecture
  • Dependency Management
  • Code Metrics & Analysis
  • Acceleration & Automation
  • Code Quality

Agile ITX speakers are successful leaders, authors, mentors, and trainers who have helped thousands of developers create better software. You will have the opportunity to hear and interact with:

Jared Richardson - co-author of Ship It!
Michael Nygard - author of Release It!
Johanna Rothman - author of Manage It!
Esther Derby - co-author of Behind Closed Doors: Secrets of Great Management
Venkat Subramaniam - co-author of Practices of an Agile Developer
David Hussman - Agility Instructor/Mentor
Andrew Glover - co-author of Continuous Integration
J.B. Rainsberger - author of JUnit Recipes
Neal Ford - Application Architect at ThoughtWorks
Kirk Knoernshild - contributor to The Agile Journal
Chris D'Agostino - CEO of Near Infinity
David Bock - Principal Consultant with CodeSherpas
Mark Johnson - Director of Consulting at CGI
Ryan Shriver - Managing Consultant with Dominion Digital
John Carnell - IT Architect at Thrivent Financial
Scott Davis - Testing Expert

Amazon Blows Away Objections

Amazon must have been burning more midnight oil than usual lately.

Within the last two weeks, they've announced three new features that basically eliminate any remaining objections to their AWS computing platform.

Elastic IP Addresses 

Elastic IP addresses solve a major problem on the front end.  When an EC2 instance boots up, the "cloud" assigns it a random IP address. (Technically, it assigns two: one external and one internal.  For now, I'm only talking about the external IP.) With a random IP address, you're forced to use some kind of dynamic DNS service such as DynDNS. That lets you update your DNS entry to connect your long-lived domain name with the random IP address.

Dynamic DNS services work pretty well, but not universally well.  For one thing, there is a small amount of delay.  Dynamic DNS works by setting a very short time-to-live (TTL) on the DNS entries, which instructs intermediate DNS servers to cache the entry only for a few minutes.  When that works well, you still have a few minutes of downtime when you need to reassign your DNS name to a new IP address.  For some parts of the Net, dynamic DNS doesn't work well, usually when some ISP doesn't respect the TTL on DNS entries, but caches them for a longer time.

Elastic IP addresses solve this problem. You request an elastic IP address through a Web Services call.  The easiest way is with the command-line API:

$ ec2-allocate-address
ADDRESS    75.101.158.25   

Once the address is allocated, you own it until you release it. At this point, it's attached to your account, not to any running virtual machine. Still, this is good enough to go update your domain registrar with the new address. After you start up an instance, then you can attach the address to the machine. If the machine goes down, then the address is detached from that instance, but you still "own" it.

So, for a failover scenario, you can reassign the elastic IP address to another machine, leave your DNS settings alone, and all traffic will now come to the new machine.

Now that we've got elastic IPs, there's just one piece missing from a true HA architecture: load distribution. With just one IP address attached to one instance, you've got a single point of failure (SPOF). Right now, there are two viable options to solve that. First, you can allocate multiple elastic IPs and use round-robin DNS for load distribution. Second, you can attach a single elastic IP address to an instance that runs a software load balancer: pound, nginx, or Apache+mod_proxy_balancer. (It wouldn't surprise me to see Amazon announce an option for load-balancing-in-the-cloud soon.) You'd run two of these, with the elastic IP attached to one at any given time. Then, you need a third instance monitoring the other two, ready to flip the IP address over to the standby instance if the active one fails. (There are already some open-source and commercial products to make this easy, but that's the subject for another post.)

Availability Zones 

The second big gap that Amazon closed recently deals with geography.

In the first rev of EC2, there was absolutely no way to control where your instances were running. In fact, there wasn't any way inside the service to even tell where they were running. (You had to resort to pingtracing or geomapping of the IPs). This presents a problem if you need high availability, because you really want more than one location.

Availability Zones let you specify where your EC2 instances should run. You can get a list of them through the command-line (which, let's recall, is just a wrapper around the web services):

$ ec2-describe-availability-zones
AVAILABILITYZONE    us-east-1a    available
AVAILABILITYZONE    us-east-1b    available
AVAILABILITYZONE    us-east-1c    available

Amazon tells us that each availability zone is built independently of the others. That is, they might be in the same building or separate buildings, but they have their own network egress, power systems, cooling systems, and security. Beyond that, Amazon is pretty opaque about the availability zones. In fact, not every AWS user will see the same availability zones. They're mapped per account, so "us-east-1a" for me might map to a different hardware environment than it does for you.

How do they come into play? Pretty simply, as it turns out. When you start an instance, you can specify which availability zone you want to run it in.

Combine these two features, and you get a bunch of interesting deployment and management options.

Persistent Storage

Storage has been one of the most perplexing issues with EC2. Simply put, anything you stored to disk while your instance was running would be lost when you restart the instance. Instances always go back to the bundled disk image stored on S3.

Amazon has just announced that they will be supporting persistent storage in the near future. A few lucky users get to try it out now, in it's pre-beta incarnation.

With persistent storage, you can allocate space in chunks from 1 GB to 1 TB.  That's right, you can make one web service call to allocate a freaking terabyte! Like IP addresses, storage is owned by your account, not by an individual instance. Once you've started up an instance---say a MySQL server, for example---you attach the storage volume to it. To the virtual machine, the storage looks just like a device, so you can use it raw or format it with whatever filesystem you want.

Best of all, because this is basically a virtual SAN, you can do all kinds of SAN tricks, like snapshot copies for backups to S3.

Persistent storage done this way obviates some of the other dodgy efforts that have been going on, like  FUSE-over-S3, or the S3 storage engine for MySQL.

SimpleDB is still there, and it's still much more scalable than plain old MySQL data storage, but we've got scores of libraries for programming with relational databases, and very few that work with key-value stores. For most companies, and for the forseeable future, programming to a relational model will be the easiest thing to do. This announcement really lowers the barrier to entry even further.

 

With these announcements, Amazon has cemented AWS as a viable computing platform for real businesses.

Geography Imposes Itself On the Clouds

In a comment to my last post, gvwilson asks, "Are you aware that the PATRIOT Act means it's illegal for companies based in Ontario, BC, most European jurisdictions, and many other countries to use S3 and similar services?"

This is another interesting case of the non-local networked world intersecting with real geography. Not surprisingly, it quickly becomes complex. 

I have heard some of the discussion about S3 and the interaction between the U.S. PATRIOT act and the EU and Canadian privacy laws. I'm not a lawyer, but I'll relate the discussion for other readers who haven't been tracking it.

Canada and the European Union have privacy laws that lean toward their citizens, and are quite protective of them. In the U.S., where laws are written about privacy at all, they are heavily biased in favor of large data-collecting corporations, such as credit rating agencies.  A key provision of the privacy laws in Canada and the EU is that companies cannot transmit private data to any jurisdiction that lacks substantially similar protections. It's kind of like the "incorporation" clause in the GPL that way.

In the U.S., particularly with respect to the USA PATRIOT act, companies are required to turn over private customer data to a variety of government agencies. In some cases, they are required to do this even without a search warrant or court order. These are pretty much just fishing expeditions; casting a broad net to see if you catch anything. Therefore, the EU/Canadian privacy laws judge that the U.S. does not have substantially similar privacy protections, and companies in those covered nations are barred from exporting, transmitting, or storing customer data in any U.S. location where they might be subject to PATRIOT act search.

(Strictly speaking, this is not just a PATRIOT act problem. It also relates to RICO and a wide variety of other U.S. laws, mostly aimed at tracking down drug dealers by their banking transactions.)

Enter S3. S3 built to be a geographically-replicated distributed storage mechanism! There is no way even to figure out where the individual bits of your data are physically located. Nor is there any way to tell Amazon what legal jurisdictions your data can, or must, reside in. This is a big problem for personal customer data. It's also a problem that Amazon is aware they must solve. For EC2, they recently introduced Availability Zones that let you define what geographic location your virtual servers will exist in. I would expect to see something similar for S3.

This would also appear to be a problem for EU and Canadian companies using Google's AppEngine. It does not offer any way to confine data to specific geographies, either.

Does this mean it's illegal for Canadian companies to use S3? Not in general. Web pages, software downloads, media files... these would all be allowed.  Just stay away from the personal data.

Suggestions for a 90-minute app

Some of you know my obsession with Lean, Agile, and ToC.  Ideas are everywhere.  Idea is nothing. Execution is everything.

In that vein, one of my No Fluff, Just Stuff talks is called "The 90 Minute Startup".  In it, I build a real, live dotcom site during the session. You can't get a much shorter time-to-market than 90 minutes, and I really like that.

In case you're curious, I do it through the use of Amazon's EC2 and S3 services. 

The app I've used for the past couple of sessions is a quick and dirty GWT app that implements a Net Promoter Score survey about the show itself. It has a little bit of AJAX-y stuff to it, since GWT makes that really, really simple. On the other hand, it's not all that exciting as an application. It certainly doesn't make anyone sit up and go "Wow!"

So, anyone want to offer up a suggestion for a "Wow!" app they'd like to see built and deployed in 90 minutes or less?  Since this is for a talk, it should be about the size of one user story. I doubt I'll be taking live requests from the audience during the show, but I'm happy to take suggestions here in the comments.

(Please note: thanks to the pervasive evil of blog comment spam, I moderate all comments here. If you want to make a suggestion, but don't want it published, just make a note of that in the comment.) 


Google's AppEngine Appears, Disappoints

Google finally got into the cloud infrastructure game, announcing their Google AppEngine. As rumored, AppEngine opens parts of Google's legendary scalable infrastructure for hosted applications.

AppEngine is in beta, with only 10,000 accounts available. They're already long gone, but you can download the SDK and run a local container.

Here are some quick pros and cons:

Pro

  • Dynamically scalable
  • Good lifecycle management
  • Quota-based management for cost containment

Con

  • Python apps only
  • You deploy code, not virtual machines
  • Web apps only

At this point, I'm a bit underwhelmed. Essentially, they're providing a virtual scalable app runtime, but not a generalized computing platform. (Similar to Sun's Project Caroline.) Access to the really cool Google features, like GFS, is through Python APIs that Google provides.

If you fit Google's profile of a Python-based Web application developer, this could be a very fast path to market with dynamic scalability.  Still, I think I'm going to stick with Amazon Web Services, instead.