Wide Awake Developers

Mounds of Filthy Data

| Comments

Data is the future.

The barriers to entering online business are pretty low, these days. You can do it with zero infrastructure, which means no capital spent on depreciating assets like servers and switches. Open source operating systems, databases, servers, middleware, libraries, and development tools mean that you don’t spend money on software licenses or maintenance contracts. All you need is an idea, followed by a SMOP.

With both the cost side trending toward zero, how can there be any barrier to entry?

The "classic" answer is the network effect, also known as Metcalfe’s Law. (The word "classic" in web business models means anything more than two years old, of course.) The first Twitter user didn’t get a whole lot out of it. The ten-million-and-first gets a lot more benefit. That makes it tough for a newcomer like Plurk to get an edge.

I see a new model emerging, though. Metcalfe’s Law is part of it, keeping people engaged. The best thing about having users, though, is that they do things. Every action by every user tells you something, if you can keep track of it all.

Twitter gets a lot of its value from the people connected at the endpoints. But, they also get enormous power from being the hub in the middle of it. Imagine what you can do when you see the content of every message passing through a system that large. A few things come to mind right away. You could extract all the links that people are posting to see what’s hot today. (Zeitgeist.) You could use semantic analysis to tell how people feel about current topics, like Presidential candidates in the U.S. You could track product names and mentions to see which products delight people and which cause frustration. You could publish a slang dictionary that actually keeps up! The possibilities are enormous.

Ah, I can already sense an objection forming. How the heck is anyone supposed to figure out all that stuff from noisy, messy textual human communication? We’re cryptic, ironic, and oblique. We sometimes mean the exact opposite of what we say. Any machine intelligence that tries to grok all of Twitter will surely self-destruct, right? That supposed "data" is just a big steaming pile of human contradictions!

In my view, though, it’s the dirtiness of the data that makes it beautiful. Yes, there will be contradictions. There will be ironic asides. But, those will come out in the wash. They’ll be balanced out by the sincere, meaningful, or obvious. Not every message will be semantically clear or consistent, but given enough messy data, clear patterns will still emerge.

There’s the key: enough data to see patterns. Large amounts. Huge amounts. Vast piles of filthy data.

Over the next couple of days, I’ll post a series of entries exploring how to amass dirty data, who’s got a natural advantage, and programming models that work with it.

Hard Problems in Architecture

| Comments

Many of the really hard problems in web architecture today exist outside the application server.  Here are three problems that haven’t been solved. Partial solutions exist today, but nothing comprehensive.

Uncontrolled demand

Users tend to arrive at web sites in huge gobs, all at once. As the population of the Net continues to grow, and the need for content aggregators/filters grows, the "front page" effect will get worse.

One flavor of this is the "Attack of Self-Denial", an email, radio, or TV campaign that drives enough traffic to crash the site.  Marketing can slow you down. Really good marketing can kill you at any time.

Versioning, deployment and rollback

With large scale applications, as with large enterprise integrations, versioning rapidly becomes a problem. Versioning of schemas, static assets, protocols, and interfaces. Add in a dash of SOA, and you have a real nightmare. You can count on having at least one interface broken at any given time. Or, you introduce such powerful governors that nothing ever changes.

As the number of nodes increases, you eventually find that there’s always at least one deployment going on. A "deployment" becomes less of a point-in-time activity than it is a rolling wave. A new service version will take hours or days to be deployed to every node. In the meantime, both the old and new service version must coexist peacefully. Since both service versions will need to support multiple protocol versions (see above) you have a combinatorial problem.

And, of course, some of these deployments will have problems of their own. Today, many application deployments are "one way" events. The deployment process itself has irreversably destructive effects. This will have to change, so every deployment can be done both forward and back. Oh, and every deployment will also be deploying assets to multiple targets—web, application, and database—while also triggering cache flushes and, possibly, metadata changes to external partners like Akamai.

Applications will need to participate in their own versioning, deployment, and management. 

Blurring the lines

There used to be a distinction between the application and the infrastructure it ran on. That meant you could move applications around and they would behave pretty much the same in a development environment as in the real production environment. These days, firewalls, load balancers, and caching appliances blur the lines between "infrastructure" and "application". It’s going to get worse as the lines between "operational support system" and "production applications" get blurred, too. Automated provisioning, SLA management, and performance management tools will all have interactions with the applications they manage. These will inevitably introduce unexpected interactions… in a word, bugs.

Creeping Fees

| Comments

A couple of years ago, the Minneapolis-St. Paul airport introduced self-pay parking gates. Scan a credit card on the way in and on the way out, and it just debits the card. This obviously saves money on parking attendants, and it’s pretty convenient for parkers.

At first, to encourage adoption, they offered a discount of $2 per day. Every time you’d approach the entry, a friendly voice from a Douglas Adams novel would ask, "Would you like to save $2 per day on parking?" For general parking, that meant $14 instead of $16 per day.

Some time later, this switched from being an incentive for adopting the system to a penalty for avoiding it. How? They raised the rates by $2 per day. So now, the top rate if you use self-pay is back to $16. If you don’t use it, then your top rate bumped up to $18. Clearly they put somebody from the banking industry in charge of this parking system.

Now, it’s changed again, from $2 per day to $2 per transaction. So it’s just $2 off the top of whatever your overall parking fees are.

This gradual creep is really interesting. I wonder what the next step will be. A $2 per year discount would be one way to approach it. Maybe a "frequent parker" program. More likely the discount will drop to $1 per transaction, or it will just be discarded altogether.

That’s OK with me, because swiping the credit card is still more convenient than exchanging cash money with a human anyway.

Besides, back when it was cash based, I always got tagged with the ATM fee anyway. 

Word Cloud Bandwagon

| Comments

Wordle has been meming it’s way around the ‘Net lately.  Figured I’d join the crowd by doing a word cloud for Release It.  This is from the preface.


Considering that this is just from fairly simple text analysis, I’m surprised at how accurately it represents the key concerns. "Software" and "money" have roughly equal prominence. "Life" appears near the middle, along with "excitement", "revenue", "production" and "systems". Not bad for an algortihm.

Webber and Fowler on SOA Man-Boobs

| Comments

InfoQ posted a video of Jim Webber and Martin Fowler doing a keynote speech at QCon London this Spring. It’s a brilliant deconstruction of the concept of the Enterprise Service Bus. I can attest that they’re both funny and articulate (whether on the stage or off.)

Along the way, they talk about building services incrementally, delivering value at every step along the way. They advocate decentralized control and direct alignment between services and the business units that own them. 

I agree with every word, though I’m vaguely uncomfortable with how often they say "enterprise man boobs".

Coincidence or Back-end Problem?

| Comments

An odd thing happened to me today. Actually, an odd thing happened yesterday, but it’s having the same odd thing happen today that really makes it odd. With me so far?

Yesterday, while I was shopping at Amazon, Amazon told me that my American Express card had expired. While it is set for a May expiration, it’s several years in the future. I didn’t think too much of it, because when I re-entered the same information, Amazon accepted it.

Today, I got the same thing with the same card on iTunes!

Online stores don’t do a whole lot with your credit cards. For the most part, they just make a call out to a credit card processor. Small stores have to go through a second-tier CCVS system that charges a few pennies per transaction. Large ones—and do they get larger than Amazon?—generally connect directly to a payment processor. The payment processor may charge a fraction of a cent per transaction, but they definitely make it up in volume.

(There are other business factors, too, like the committed transaction volume, response time SLAs, and the like.)

Asynchronously, the payment processor collects from the issuing bank. It’s the issuing bank that actually bills you, and sets your interest rate and payment terms.

Whereas VISA and MasterCard work with thousands of issuers, American Express doesn’t. When you get an AmEx card, they are the issuing bank as well as the payment processor.

Which makes it highly suspect that the same card gave me the same error through two different sites. It makes me think that American Express has introduced a bug in their validation system, causing spurious declines for expiration. 

Social Factors

| Comments

I mentioned Tom DeMarco just a couple of days ago. I’m re-reading his great book, Why Does Software Cost So Much? for the first time in about ten years.

Personally, I credit Tom as one of the unsung progenitors of the agile movement. Long before we had "Agile" or even "lightweight methods", Tom was talking about the psycho-social nature of software development. 

For instance, here’s an excerpt from essay 8, "Nontechnological Issues in Software Engineering":

Imagine your boss just plunked a specification on your desk and asked, "How long will it take you and one other person to get this job done?" What’s the first question out of your mouth?

Would you ask, "Can we use object-oriented methods?" or "What CASE system can we buy?" or "Is it okay to use rapid prototyping?" Of course not. Your first question is,

Who is the other person?

Absolutely. Right on, Tom. 


| Comments

A friend invited me to Plurk. So far, I’ve resisted Twitter for no good reason (other than a vague sense of social insecurity.) I figure I’ll dip my toe into Plurk, though.

This link is an open invite to Plurk. It’ll let anyone join. Fair warning, it’s also a "friend" link. 

Six Word Methods

| Comments

In his great collection of essays Why Does Software Cost So Much?, Tom DeMarco makes the interesting point that the software industry had grown from zero to $300 billion dollars (in 1993). This indicates that the market had at least $300B worth of demand for software, even while complaining continuously about the cost and quality of the very same software. It seems to me that the demand for software production, together with the time and cost pressures, has only increased dramatically since then.

(DeMarco enlightens us that the perennial question, “Why does software cost so much?” is not really a question at all, but rather a goad or a negotiation. Also very true.)

Fundamentally, the demand for software production far outstrips our industry’s ability to supply it. In fact, I believe that we can classify most software methods and techniques by their relation and response to the problem of surplus demand. Some try to optimize for least-cost production, others for highest quality, still others for shortest cycle time.

In the spirit of six-word memoirs, here are the sometimes dubious responses that various technology and development methods’ offer to the overwhelming demand for software production. 

Waterfall: Nevermind backlog, requirements were signed off.

RAD: Build prototypes faster than discarding them.

Offshore outsourcing: Army of cheap developers producing junk.

Onshore outsourcing: Same junk, but with expensive developers.

Agile: Avoid featuritis; outrun pesky business users.

Domain-specific languages: Compress every problem into one-liners.

CMMi: Enough Process means nothing’s ever wasted.

Relational Databases: Code? Who cares? Data lives forever.

Model-driven architecture: Jackson Pollack’s models into inscrutable code.

Web Services: Terrorize XML until maximum reuse achieved.

FORTH: backward writing IF punctuation time SAVE.

SOA: Iron-fisted governance ensures total calcification.

Intentional programming: Parallelize programming… make programmers of everyone.

Google as IDE: It’s been done, probably in Befunge.

Open-source: Bury the world in abandoned code.

Mashups: Parasitize others’ apps, then APIs change.

LISP: With enough macros, one uberprogrammer sufficies.

perl: Too busy coding to maintain anyway.

Ruby: Meta-programming: same problems, mysterious solutions.

Ocaml: No, try meta-meta-meta-programming.

Groovy: Faster Java coding, runs like C-64.

Software-as-a-Service: Don’t write your own, rent ours.

Cloud Computing: Programmers would go faster without administrators.

New Article: S2AP + Eclipse + Maven Walkthrough

| Comments

See Getting Started With SpringSource Application Platform, Eclipse, and Maven.

Most of the information out there about programming in S2AP is in blogs or references to really old OSGi tutorials. It took me long enough to configure some basic Eclipse project support that I figured it was worth writing down. All of the frameworks and tool sets are very flexible, which means you have more choices to deal with when setting up a project. Sometimes, being concrete helps… there may be a lot of options, but when it’s time to do a project, you only care about one set of choices for those options. This guide is completely specific to using Eclipse to write bundle projects for SpringSource Application Platform.

If that’s your specific set of needs, great! If not, that’s OK too, because the beauty of the Web is that somebody else will have a tutorial on your exact combination, too.