Wide Awake Developers

Agile IT! Experience

| Comments

On June 26-28, 2008, I’ll be speaking at the inagural Agile IT! Experience symposium in Reston, VA. Agile ITX is about consistently delivering better software. It’s for development teams and management, working and learning together.

It’s a production of the No Fluff, Just Stuff symposium series.  Like all NFJS events, attendance is capped, so be sure to register early.

From the announcement email:

The central theme of the Agile ITX conference (www.agileitx.com) is to help your development team/management consistently deliver better software. We’ll focus on the entire software development life cycle, from requirements management to test automation to software process. You’ll learn how to Develop in Iterations, Collborate with Customers, and Respond to Change. Software is a difficult field with high rates of failure. Our world-class speakers will help you implement best practices, deal with persistent problems, and recognize opportunities to improve your existing practices.

Dates: June 26-28, 2008

Location: Sheraton Reston

Attendance: Developers/ Technical Management

Sessions at Agile ITX will cover topics such as:

  • Continuous Integration (CI)
  • Test Driven Development (TDD)
  • Testing Strategies, Team Building
  • Agile Architecture
  • Dependency Management
  • Code Metrics & Analysis
  • Acceleration & Automation
  • Code Quality

Agile ITX speakers are successful leaders, authors, mentors, and trainers who have helped thousands of developers create better software. You will have the opportunity to hear and interact with:

Jared Richardson - co-author of Ship It!
Michael Nygard - author of Release It!
Johanna Rothman - author of Manage It!
Esther Derby - co-author of Behind Closed Doors: Secrets of Great Management
Venkat Subramaniam - co-author of Practices of an Agile Developer
David Hussman - Agility Instructor/Mentor
Andrew Glover - co-author of Continuous Integration
J.B. Rainsberger - author of JUnit Recipes
Neal Ford - Application Architect at ThoughtWorks
Kirk Knoernshild - contributor to The Agile Journal
Chris D’Agostino - CEO of Near Infinity
David Bock - Principal Consultant with CodeSherpas
Mark Johnson - Director of Consulting at CGI
Ryan Shriver - Managing Consultant with Dominion Digital
John Carnell - IT Architect at Thrivent Financial
Scott Davis - Testing Expert

Amazon Blows Away Objections

| Comments

Amazon must have been burning more midnight oil than usual lately.

Within the last two weeks, they’ve announced three new features that basically eliminate any remaining objections to their AWS computing platform.

Elastic IP Addresses 

Elastic IP addresses solve a major problem on the front end.  When an EC2 instance boots up, the "cloud" assigns it a random IP address. (Technically, it assigns two: one external and one internal.  For now, I’m only talking about the external IP.) With a random IP address, you’re forced to use some kind of dynamic DNS service such as DynDNS. That lets you update your DNS entry to connect your long-lived domain name with the random IP address.

Dynamic DNS services work pretty well, but not universally well.  For one thing, there is a small amount of delay.  Dynamic DNS works by setting a very short time-to-live (TTL) on the DNS entries, which instructs intermediate DNS servers to cache the entry only for a few minutes.  When that works well, you still have a few minutes of downtime when you need to reassign your DNS name to a new IP address.  For some parts of the Net, dynamic DNS doesn’t work well, usually when some ISP doesn’t respect the TTL on DNS entries, but caches them for a longer time.

Elastic IP addresses solve this problem. You request an elastic IP address through a Web Services call.  The easiest way is with the command-line API:

$ ec2-allocate-address
ADDRESS    75.101.158.25   

Once the address is allocated, you own it until you release it. At this point, it’s attached to your account, not to any running virtual machine. Still, this is good enough to go update your domain registrar with the new address. After you start up an instance, then you can attach the address to the machine. If the machine goes down, then the address is detached from that instance, but you still "own" it.

So, for a failover scenario, you can reassign the elastic IP address to another machine, leave your DNS settings alone, and all traffic will now come to the new machine.

Now that we’ve got elastic IPs, there’s just one piece missing from a true HA architecture: load distribution. With just one IP address attached to one instance, you’ve got a single point of failure (SPOF). Right now, there are two viable options to solve that. First, you can allocate multiple elastic IPs and use round-robin DNS for load distribution. Second, you can attach a single elastic IP address to an instance that runs a software load balancer: pound, nginx, or Apache+mod_proxy_balancer. (It wouldn’t surprise me to see Amazon announce an option for load-balancing-in-the-cloud soon.) You’d run two of these, with the elastic IP attached to one at any given time. Then, you need a third instance monitoring the other two, ready to flip the IP address over to the standby instance if the active one fails. (There are already some open-source and commercial products to make this easy, but that’s the subject for another post.)

Availability Zones 

The second big gap that Amazon closed recently deals with geography.

In the first rev of EC2, there was absolutely no way to control where your instances were running. In fact, there wasn’t any way inside the service to even tell where they were running. (You had to resort to pingtracing or geomapping of the IPs). This presents a problem if you need high availability, because you really want more than one location.

Availability Zones let you specify where your EC2 instances should run. You can get a list of them through the command-line (which, let’s recall, is just a wrapper around the web services):

$ ec2-describe-availability-zones
AVAILABILITYZONE    us-east-1a    available
AVAILABILITYZONE    us-east-1b    available
AVAILABILITYZONE    us-east-1c    available

Amazon tells us that each availability zone is built independently of the others. That is, they might be in the same building or separate buildings, but they have their own network egress, power systems, cooling systems, and security. Beyond that, Amazon is pretty opaque about the availability zones. In fact, not every AWS user will see the same availability zones. They’re mapped per account, so "us-east-1a" for me might map to a different hardware environment than it does for you.

How do they come into play? Pretty simply, as it turns out. When you start an instance, you can specify which availability zone you want to run it in.

Combine these two features, and you get a bunch of interesting deployment and management options.

Persistent Storage

Storage has been one of the most perplexing issues with EC2. Simply put, anything you stored to disk while your instance was running would be lost when you restart the instance. Instances always go back to the bundled disk image stored on S3.

Amazon has just announced that they will be supporting persistent storage in the near future. A few lucky users get to try it out now, in it’s pre-beta incarnation.

With persistent storage, you can allocate space in chunks from 1 GB to 1 TB.  That’s right, you can make one web service call to allocate a freaking terabyte! Like IP addresses, storage is owned by your account, not by an individual instance. Once you’ve started up an instance—say a MySQL server, for example—you attach the storage volume to it. To the virtual machine, the storage looks just like a device, so you can use it raw or format it with whatever filesystem you want.

Best of all, because this is basically a virtual SAN, you can do all kinds of SAN tricks, like snapshot copies for backups to S3.

Persistent storage done this way obviates some of the other dodgy efforts that have been going on, like  FUSE-over-S3, or the S3 storage engine for MySQL.

SimpleDB is still there, and it’s still much more scalable than plain old MySQL data storage, but we’ve got scores of libraries for programming with relational databases, and very few that work with key-value stores. For most companies, and for the forseeable future, programming to a relational model will be the easiest thing to do. This announcement really lowers the barrier to entry even further.

 

With these announcements, Amazon has cemented AWS as a viable computing platform for real businesses.

Geography Imposes Itself on the Clouds

| Comments

In a comment to my last post, gvwilson asks, "Are you aware that the PATRIOT Act means it’s illegal for companies based in Ontario, BC, most European jurisdictions, and many other countries to use S3 and similar services?"

This is another interesting case of the non-local networked world intersecting with real geography. Not surprisingly, it quickly becomes complex. 

I have heard some of the discussion about S3 and the interaction between the U.S. PATRIOT act and the EU and Canadian privacy laws. I’m not a lawyer, but I’ll relate the discussion for other readers who haven’t been tracking it.

Canada and the European Union have privacy laws that lean toward their citizens, and are quite protective of them. In the U.S., where laws are written about privacy at all, they are heavily biased in favor of large data-collecting corporations, such as credit rating agencies.  A key provision of the privacy laws in Canada and the EU is that companies cannot transmit private data to any jurisdiction that lacks substantially similar protections. It’s kind of like the "incorporation" clause in the GPL that way.

In the U.S., particularly with respect to the USA PATRIOT act, companies are required to turn over private customer data to a variety of government agencies. In some cases, they are required to do this even without a search warrant or court order. These are pretty much just fishing expeditions; casting a broad net to see if you catch anything. Therefore, the EU/Canadian privacy laws judge that the U.S. does not have substantially similar privacy protections, and companies in those covered nations are barred from exporting, transmitting, or storing customer data in any U.S. location where they might be subject to PATRIOT act search.

(Strictly speaking, this is not just a PATRIOT act problem. It also relates to RICO and a wide variety of other U.S. laws, mostly aimed at tracking down drug dealers by their banking transactions.)

Enter S3. S3 built to be a geographically-replicated distributed storage mechanism! There is no way even to figure out where the individual bits of your data are physically located. Nor is there any way to tell Amazon what legal jurisdictions your data can, or must, reside in. This is a big problem for personal customer data. It’s also a problem that Amazon is aware they must solve. For EC2, they recently introduced Availability Zones that let you define what geographic location your virtual servers will exist in. I would expect to see something similar for S3.

This would also appear to be a problem for EU and Canadian companies using Google’s AppEngine. It does not offer any way to confine data to specific geographies, either.

Does this mean it’s illegal for Canadian companies to use S3? Not in general. Web pages, software downloads, media files… these would all be allowed.  Just stay away from the personal data.

Suggestions for a 90-minute App

| Comments

Some of you know my obsession with Lean, Agile, and ToC.  Ideas are everywhere.  Idea is nothing. Execution is everything.

In that vein, one of my No Fluff, Just Stuff talks is called "The 90 Minute Startup".  In it, I build a real, live dotcom site during the session. You can’t get a much shorter time-to-market than 90 minutes, and I really like that.

In case you’re curious, I do it through the use of Amazon’s EC2 and S3 services. 

The app I’ve used for the past couple of sessions is a quick and dirty GWT app that implements a Net Promoter Score survey about the show itself. It has a little bit of AJAX-y stuff to it, since GWT makes that really, really simple. On the other hand, it’s not all that exciting as an application. It certainly doesn’t make anyone sit up and go "Wow!"

So, anyone want to offer up a suggestion for a "Wow!" app they’d like to see built and deployed in 90 minutes or less?  Since this is for a talk, it should be about the size of one user story. I doubt I’ll be taking live requests from the audience during the show, but I’m happy to take suggestions here in the comments.

(Please note: thanks to the pervasive evil of blog comment spam, I moderate all comments here. If you want to make a suggestion, but don’t want it published, just make a note of that in the comment.) 


Google’s AppEngine Appears, Disappoints

| Comments

Google finally got into the cloud infrastructure game, announcing their Google AppEngine. As rumored, AppEngine opens parts of Google’s legendary scalable infrastructure for hosted applications.

AppEngine is in beta, with only 10,000 accounts available. They’re already long gone, but you can download the SDK and run a local container.

Here are some quick pros and cons:

Pro

  • Dynamically scalable
  • Good lifecycle management
  • Quota-based management for cost containment

Con

  • Python apps only
  • You deploy code, not virtual machines
  • Web apps only

At this point, I’m a bit underwhelmed. Essentially, they’re providing a virtual scalable app runtime, but not a generalized computing platform. (Similar to Sun’s Project Caroline.) Access to the really cool Google features, like GFS, is through Python APIs that Google provides.

If you fit Google’s profile of a Python-based Web application developer, this could be a very fast path to market with dynamic scalability.  Still, I think I’m going to stick with Amazon Web Services, instead. 

Steve Jobs Made Me Miss My Flight

| Comments

Or: On my way to San Jose.

On waking, I reach for my blackberry. It tells me what city I’m in; the hotel rooms offer no clues. Every Courtyard by Marriott is interchangeable.  Many doors into the same house. From the size of my suitcase, I can recall the length of my stay: one or two days, the small bag.  Three or four, the large. Two bags means more than a week.

CNBC, shower, coffee, email. Quick breakfast, $10.95 (except in California, where it’s $12.95. Another clue.)

Getting there is the worst part. Flying is an endless accumulation of indignities. Airlines learned their human factors from hospitals. I’ve adapted my routine to minimize hassles.

Park in the same level of the same ramp. Check in at the less-used kiosks in the transit level. Check my bag so I don’t have to fuck around with the overhead bins. I’d rather dawdle at the carousel than drag the thing around the terminal anyway.

Always the frequent flyer line at the security checkpoint. Sometimes there’s an airline person at the entrance of that line to check my boarding pass, sometimes not. An irritation. I’d rather it was always, or never. Sometimes means I don’t know if I need my boarding pass out or not.

Same words to the TSA agent.  Standard responses. "Doing fine," whether I am or not.  Same belt.  It’s gone through the metal detector every time. I don’t need to take it off.

Only… today, something is different. Instead of my bags trundling through the x-ray machine, she stops the belt.  Calls over another agent, a palaver. Another agent flocks to the screen. A gabble, a conference, some consternation.

They pull my laptop, my new laptop making its first trip with me, out of the flow of bags. One takes me aside to a partitioned cubicle. Another of the endless supply of TSA agents takes the rest of my bags to a different cubicle. No yellow brick road here, just a pair of yellow painted feet on the floor, and my flight is boarding. I am made to understand that I should stand and wait.  My laptop is on the table in front of me, just beyond reach, like I am waiting to collect my personal effects after being paroled.

I’m standing, watching my laptop on the table, listening to security clucking just behind me. "There’s no drive," one says. "And no ports on the back. It has a couple of lines where the drive should be," she continues.

A younger agent, joins the crew. I must now be occupying ten, perhaps twenty, percent of the security force. At this checkpoint anyway. There are three score more at the other five checkpoints. The new arrival looks at the printouts from x-ray, looks at my laptop sitting small and alone. He tells the others that it is a real laptop, not a "device". That it has a solid-state drive instead of a hard disc. They don’t know what he means. He tries again, "Instead of a spinning disc, it keeps everything in flash memory." Still no good. "Like the memory card in a digital camera." He points to the x-ray, "Here. That’s what it uses instead of a hard drive."

The senior agent hasn’t been trained for technological change. New products on the market? They haven’t been TSA approved. Probably shouldn’t be permitted. He requires me to open the "device" and run a program. I do, and despite his inclination, the lead agent decides to release me and my troublesome laptop.  My flight is long gone now, so I head for the service center to get rebooked.

Behind me, I hear the younger agent, perhaps not realizing that even the TSA must obey TSA rules, repeating himself.

"It’s a MacBook Air."

The Granularity Problem

| Comments

I spend most of my time dealing with large sites. They’re always hungry for more horsepower, especially if they can serve more visitors with the same power draw. Power goes up much faster with more chassis than with more CPU core. Not to mention, administrative overhead tends to scale with the number of hosts, not the number of cores. For them, multicore is a dream come true.

I ran into an interesting situation the other day, on the other end of the spectrum.

One of my team was working with a client that had relatively modest traffic levels. They’re in a mature industry with a solid, but not rabid, customer base. Their web traffic needs could easily be served by one Apache server running one CPU and a couple of gigs of RAM.

The smallest configuration we could offer, and still maintain SLAs, was two hosts, with a total of 8 CPU cores running at 2 GHz, 32 gigs of RAM, and 4 fast Ethernet ports.

Of course that’s oversized! Of course it’s going to cost more than it should! But at this point in time, if we’re talking about dedicated boxes, that’s the smallest configuration we can offer! (Barring some creative engineering, like using fully depreciated "classics" hardware that’s off its original lease, but still has a year or two before EOL.)

As CPUs get more cores, the minimum configuration is going to become more and more powerful. The quantum of computing is getting large.

Not every application will need it, and that’s another reason I think private clouds make a lot of sense. Companies can buy big boxes, then allocate them to specific applications in fractions. Gains cost efficiency in adminstration, power, and space consumption (though not heat production!) while still letting business units optimize their capacity downward to meet their actual demand.