<feed xmlns="http://www.w3.org/2005/Atom"><title>michaelnygard.com</title><link rel="self" href="https://michaelnygard.com/"/><updated>2026-05-11T13:02:14-05:00</updated><id>https://michaelnygard.com/</id><generator>Hugo -- gohugo.io</generator><entry><title>AI versus Throughput</title><link href="https://michaelnygard.com/blog/2026/05/ai-versus-throughput/"/><id>https://michaelnygard.com/blog/2026/05/ai-versus-throughput/</id><published>2026-05-11T13:02:14-05:00</published><updated>2026-05-11T13:02:14-05:00</updated><content type="html"><![CDATA[<p>According to the Theory of Constraints (ToC), every system has exactly one bottleneck that determines the throughput of the overall system. ToC originated with manufacturing, but its results have been observed to be quite general.</p>
<p><img src="/images/blog/2026-05-11-ai-versus-throughput/basic-process.png" alt="Linear process of three work stations with station 2 highlighted as the bottleneck"></p>
<p>In the diagram above, the whole process cannot deliver any faster than Station 2 is able to transform its buffer stock of unfinished inventory into its own output. Suppose you make Station 3 run ten times faster than it did last week, leaving everything else the same. The overall process throughput won&rsquo;t change. All that will happen is for Station 3 to be starved of inputs and forced to idle part of its time.</p>
<p>Suppose instead you invest in the throughput of Station 1, increasing it by ten times while again leaving everything else unchanged. The overall process throughput <em>still</em> won&rsquo;t change. Station 1 will just overproduce and build up inventory of unfinished work waiting for Station 2. In some ways this is even worse, because inventory has a carry cost. Not only that, but the bigger the piles of unfinished inventory, the longer it takes something <em>new</em> to get through the whole process, since the new piece has to wait in those queues of unfinished work.</p>
<p>The only place you want to invest in capacity improvement is at the bottleneck. Increase capacity at Station 2 by ten times and you should get ten times the throughput. Why do I say you &ldquo;should get ten times the throughput&rdquo; instead of &ldquo;you <em>will</em> get ten times the throughput?&rdquo; Two things can go wrong. Station 1 might not be able to produce enough to keep Station 2 busy at its new higher capacity. That would indicate Station 1 is now the new bottleneck. Or Station 3 might not be able to consume enough of Station 2&rsquo;s outputs, causing an inventory buildup. That would mean Station 3 is the new bottleneck.</p>
<h2 id="manufacturing-software">Manufacturing Software</h2>
<p>Writing code is not making car parts. Even so, it has proven useful to look at the overall <em>system of work</em> in terms of its throughput, cycle time, and failure rate. In an organization that creates software, the inventory looks like spreadsheets, Jira tickets, Git commits, and undeployed pull requests. The work stations are product managers, designers, developers, merge queues, build pipelines, and pre-prod environments.</p>
<p>&ldquo;Throughput&rdquo; here means how many features you deliver in a given time frame. If most of your features are valued-producing then higher throughput means higher profits.</p>
<p>So what happens when you adopt AI driven development? The short version is that you&rsquo;ve increased the capacity of Station 2. Does that mean you&rsquo;ve increased the overall throughput? Maybe, maybe not. You might just be overproducing inventory.</p>
<p><a href="https://github.com">Github</a> has recently <a href="https://github.blog/news-insights/company-news/an-update-on-github-availability/">blamed their poor availability</a> on &ldquo;acceleration in agentic development workflow&rdquo;. That sounds like a machine breaking.</p>
<p>Your build pipeline may be breaking too. Is your CI machinery keeping up with the increased PR volume? Reviewing pull requests, running builds, tests, security scans&hellip; these all consume the capacity in the later stages of your delivery process. Watch out for increasing queue depth there! That&rsquo;s unfinished inventory building up and it would mean the bottleneck is not coding but validation and verification (V&amp;V).</p>
<p>You might also see an increased number of deployments into your production environment. This is good, mostly. If your services are configured to hard kill on deployments then every deployment means failed requests for some number of users. Maybe that&rsquo;s tolerable at a low deployment frequency. But keep in mind that a user request fails if <em>any</em> of the services along its call tree break. A microservice soup frothing with deployments will feel 88% available instead of 99%.</p>
<h2 id="double-bind">Double Bind</h2>
<p>It&rsquo;s a cruel irony: our best way to catch AI mistakes is to increase V&amp;V work, but that slows down the new bottleneck stage. We need to increase the capacity of build systems to match.</p>
<ul>
<li>If you depend on external vendors during build and deploy, you need to monitor them carefully to see when its time to pull those dependencies in house.</li>
<li>Platform teams are usually understaffed and their tools undersized. Keep an eye on your artifact and container repositories.</li>
<li>SAST and DAST capacity are going to be very tough to increase since their output still often requires human review.</li>
<li>We typically instruct AI agents to write unit tests but we don&rsquo;t tell them to measure test execution time or consolidate tests. As a result, test times have a way of creeping upward. Consider replacing comprehensive unit tests with acceptance tests.</li>
</ul>
<h2 id="converging-parallel-workstreams">Converging Parallel Workstreams</h2>
<p>So far I&rsquo;ve mostly talked about a single flow through a single process. In reality, each team is like their own production line. Those lines all converge at a big integration step. Defects creep in at the interfaces between services, so integration is where many defects are uncovered. Great! We <em>want</em> our build process to catch those defects and reject changes that would break the system. However, each rejected build means work goes back upstream. Coding is not the bottleneck any more, so we shouldn&rsquo;t worry about adding to the load on the coding agents. However, once an agent fixes a defect it pushes another change forward into V&amp;V. This added load is called &ldquo;failure demand.&rdquo;</p>
<p>Another lesson from ToC is to do more work upstream to avoid putting defective pieces into the bottleneck. Does that create a paradox? How do I avoid putting defective work into the stage that detects the defects?</p>
<p>One way is through more cross-service quality checking during code creation. Local builds run by agents should catch more of the integration defects. Perhaps via property-based testing. Maybe we move to strongly typed API specifications outside of the service code, taking a lesson from the days of <a href="https://en.wikipedia.org/wiki/Interface_description_language">IDLs</a>.</p>
<h2 id="summing-up">Summing Up</h2>
<p>Our architecture and processes evolved for human-driven coding rates. Code production is no longer the bottleneck, if it ever was. Right now, it looks like the next bottleneck will be version control, build, test, and deployment systems. The lag time between &ldquo;code commit&rdquo; and &ldquo;running in production&rdquo; has gotten longer and longer as we&rsquo;ve added more automated checks&hellip; and we need to add even more to catch agentic errors. Our build systems aren&rsquo;t prepared for the volume of work. We need to elevate the capacity of those processes and simultaneously offload work to upstream stages where possible. It&rsquo;s OK to generate more code if it reduces failure demand or validation time in the build pipeline.</p>
]]></content></entry><entry><title>AI Versus Microservices</title><link href="https://michaelnygard.com/blog/2026/05/ai-versus-microservices/"/><id>https://michaelnygard.com/blog/2026/05/ai-versus-microservices/</id><published>2026-05-09T09:39:41-05:00</published><updated>2026-05-09T09:39:41-05:00</updated><content type="html"><![CDATA[<p>Microservices were always a technical solution to an organizational problem.</p>
<h2 id="the-road-more-traveled">The Road More Traveled</h2>
<p>Think back to the early 2010&rsquo;s and imagine yourself as a startup CEO. You have a vision for a better app or website and you&rsquo;ve gotten a bunch of VC funding. The trouble is that anywhere between ten and a thousand other startups have <em>very similar</em> ideas. As with any <a href="https://en.wikipedia.org/wiki/Metcalfe%27s_law">Metcalfe&rsquo;s law</a> company, at most two of you will survive. First place wins 80% of the market. Second place gets 15%. Third place is you&rsquo;re fired.</p>
<p>Your singular objective is to gain customers faster than your competitors.</p>
<p>You&rsquo;ve got a CRO with incredible hockey-stick charts. Your CFO has a burndown chart and an &ldquo;end by&rdquo; date. If the hockey stick bends upward before the burndown crosses zero, you win. If not, you lose.</p>
<p>You also have a CTO telling you that the team is working as fast as they can, but there&rsquo;s only so much code anyone can sling in a day, and his ship dates push the hockey stick <em>way</em> too far out to the right. The only possible response you have is &ldquo;then get a bigger team&rdquo;. The CTO says something about <a href="https://en.wikipedia.org/wiki/The_Mythical_Man-Month">mythical months and pregnant women</a>, which sounds both totally irrelevant and possibly sexist to you. The CTO &ndash; or the next occupant of that chair &ndash; will expand the dev team.</p>
<p>So now the CTO has a problem. Expand the team by 10, 20, or 100x and ship faster. But experience shows that adding people to a project slows it down due to communication overhead and coordination cost. The solution is to break the product into many smaller projects, each with its own two-pizza team.</p>
<p>Microservices allow horizontal scaling of a dev organization with sub-exponential coordination cost. Each team only needs to know about its local neighborhood of services. (This also reduces the other negative effect of rapid hiring: the handful of people that have global knowledge about the system are diluted and outnumbered 100 to 1.)</p>
<p>In other words, microservices are an attempt to balance two opposing forces: exponential slowdown from communication and coordination (C&amp;C) versus linear speedup from parallel feature development. If the team gets the balance right and carves the service boundaries well, they can continuously ship small batches of functionality and A/B test your company into the unicorn club. If they get it wrong, they&rsquo;ll spend all their time at the whiteboard and all your money on Splunk while your customers complain on Reddit.</p>
<h2 id="sidebar---microservices-in-big-companies">Sidebar - Microservices in Big Companies</h2>
<p>Large companies also found benefits to carving ancient monoliths into microservices. In their case it was less about out-competing to win a market &ndash; though some of them needed to compete a lot better to <em>retain</em> their market than they had been. Instead they needed to break out of a web of cyclic technical dependencies and architectural decay that was slowing development to a crawl. Basically, many of them found that they were <em>already</em> on the wrong side of the exponential C&amp;C slowdown and needed to get shipping again.</p>
<h2 id="whats-changed---flavor-1">What&rsquo;s Changed - Flavor 1</h2>
<p>The same story about scaling out via microservices and dedicated two-pizza teams can be interpreted differently. From a critical perspective, the result is also organizational boundaries / managerial lines of control around architectural boundaries in the system. (It&rsquo;s legally required to mention <a href="https://martinfowler.com/bliki/ConwaysLaw.html">Conway&rsquo;s Law</a> and the <a href="https://jonnyleroy.com/2011/02/03/dealing-with-creaky-legacy-platforms/">Inverse Conway Maneuver</a> here.) The trouble is that it&rsquo;s much, much easier to change boundaries in software than in the organization.</p>
<p>In a monolith, nobody gets laid off when you delete a class.</p>
<p>With microservices, no team ever eliminates their own raison d&rsquo;être and lays themselves off. Instead you&rsquo;ll get a version two or version three that expands the functionality of their services.</p>
<p>Your startup from the 2010&rsquo;s now has six thousand services owned by hundreds of squads and nobody knows how it all works.</p>
<p>You mandate that every developer (and non-developer) must use AI coding agents for their work, seeking the speedup that you believe is possible. And it works! Pull request volume shoots way up. Lines of code modified is through the roof! Small features are delivered faster: copy edits, graphical tweaks, UX changes. They&rsquo;re even being tested behind feature flags and experiments and agents are automatically removing changes that test badly.</p>
<p>(A side effect is that your customers never see the same app twice, and encounter paper-cut bugs on a daily basis. But you&rsquo;ve got an AI chatbot responding to their complaints so you don&rsquo;t hear their frustration yet.)</p>
<p>The trouble is that the <em>big</em> things aren&rsquo;t moving any faster. New markets, new products, new cross-cutting features are still just as slow to produce because your overall architecture is still fragmented.</p>
<p>(This will come as no surprise to anyone who has read their <a href="http://reinertsenassociates.com/books/">Reinertsen</a>, <a href="https://itrevolution.com/product/wiring-the-winning-organization/">Kim &amp; Spear</a>, or <a href="https://www.amazon.com/Lean-Software-Development-Agile-Toolkit/dp/0321150783">Poppendieck</a>.)</p>
<p>With AI agents you <em>want</em> to scale down your dev team but the architecture was optimized for scaling out not down. You need each developer, and their pod of AI agents, to own larger units of code but the microservice boundaries are too small and fragmented.</p>
<p>Meanwhile, the constant news of large-scale layoffs has every developer scared for their current job and scared they won&rsquo;t find another one. So absolutely <em>nobody</em> is going to raise their hand and say &ldquo;I think my service shouldn&rsquo;t exist any more.&rdquo; (It seems more likely that you&rsquo;ll have vicious turf wars and middle managers running annexation campaigns, all with AI-produced docs and decks beautifully justifying their maneuvers.)</p>
<p>The resulting tension is the next <a href="https://asimov.fandom.com/wiki/Seldon_Crisis">Seldon Crisis</a> facing these companies.</p>
<h2 id="whats-changed---flavor-2">What&rsquo;s Changed - Flavor 2</h2>
<p>It is now almost twenty years since the <a href="https://www.michaelnygard.com/blog/2008/06/webber-and-fowler-on-soa-man-boobs/">anti-SOA rebellion</a>. It&rsquo;s fair to call the microservice architecture the dominant architectural style. Startups today begin with a monolith in their early days but once they find a degree of product/market fit, they look to rebuild for scale in microservices.</p>
<p>I don&rsquo;t know what the next dominant style will be. Maybe we&rsquo;ll call it &ldquo;macroservices&rdquo; or &ldquo;megaservices&rdquo;, though I doubt it. Neither of those words have the &ldquo;cool factor&rdquo; that will help consultancies sell services.</p>
<p>I can say that we need to find a new way to draw the boundaries. Here are some of the forces I see that will affect how we do that:</p>
<ul>
<li>It has to be much, much safer to ship code than it is today. That&rsquo;s a simply consequence of the risk equation:
\[ \text{Expected Loss} = \sum_{i=1}^{n} \left( P(\text{loss}_i) \times \text{Opportunities}_i \times \mathbb{E}[\text{loss per opportunity}_i] \right)
\]
We rely on automation in our CI pipelines: unit tests, security scans, some performance testing. We also rely on experimentation and feature flags. The trouble is that AI agents are just as likely to rewrite or delete the tests, or to &ldquo;reinterpret&rdquo; experimentation results, in order to reach the goals we give them.</li>
<li>Governance mechanisms must be rebuilt to deal with the scale of changes. Security scanning tools already report far too many false positives for the human-driven rate of change. Domain name governance isn&rsquo;t sufficiently automated at most companies. Data governance is partially automated, partially human driven for classification. Systems for data subject rights are cobbled together. As it is, few of the technical staff care about data subject rights and <em>none</em> of the non-technical staff pushing code are worried about it. I predict ever more frequent small scale data breaches of the stupid &ldquo;DB endpoint was left public&rdquo; variety. (In this post I&rsquo;ve only been talking about AI driven development, not about agent-in-the-loop systems, but data governance for those is in an impossible double-bind. Everybody wants their agent to have access to all the data, but the agents have no guardrails at all. They&rsquo;ll readily ship all your private customer data off to a free-tier Firebase if you let them.)</li>
<li>Access control based on tokens and keys seems fundamentally unfixable. Leaked keys in externally hosted repositories are exploited in minutes and can cost a fortune before they&rsquo;re discovered. So far, there don&rsquo;t seem to be guardrails immune to agentic bypass. There are too many external platforms that agents are trained on, but your company&rsquo;s bespoke platform is in nobody&rsquo;s corpus. Any non-technical user&rsquo;s agent can spin up a fresh Vercel account and your pipeline&rsquo;s checks cannot stop it.</li>
<li>Each developer must own larger units of code. I have previously valued collective code ownership. However that&rsquo;s a principle for spreading knowledge among humans, with the intent of diffusing knowledge faster than the truth of the code changes. That now appears hopeless. The change rate is too fast. Not even one human fully understands the code they&rsquo;re accountable for.</li>
<li>Another reason for larger ownership scope: Merge conflicts are a big issue again. Agents will happily rewrite code that is working fine. Two devs with their swarms create <em>massive</em> merge conflicts. Right now this looks like <em>even more productivity</em> since resolving those conflicts spends tokens and boosts LOC changed metrics. That activity doesn&rsquo;t create value though.</li>
<li>We need validation beyond test code in the same repository. Constant rewriting by the agents causes regressions. Agents will &ldquo;fake it&rdquo; by changing the test instead of fixing the code. If you try to nail all existing behavior in place with vast suites of unit tests, then you&rsquo;ll spend all your tokens on reading and updating test code instead of production code. More tokenmaxxing without value.</li>
<li>The specification of APIs a service offers must be externalized. When changing code, agents don&rsquo;t value the principle of keeping your interface stable. They&rsquo;ll change an API that other services depend on. You might catch this in CI, if you&rsquo;ve built good compatibility checking. It&rsquo;s better to put the API specification someplace separate and confine the coding agents so they aren&rsquo;t allowed to mutate it during normal development. (As sub-principle here is that accretive interfaces are more important than ever so that unilateral change in the specification &ndash; by a different agent &ndash; is possible.)</li>
<li>We need to think about <a href="https://teamtopologies.com/">team topologies</a> at two different scales: within the &ldquo;pod&rdquo; of the developer and agents, and between pods. This will determine how fast you ship large scale (cross-pod) features more than PR rates or LOC changes.</li>
</ul>
<h2 id="summing-up">Summing Up</h2>
<p>Every radically new technology has caused a shift in the dominant architecture, and often in the languages and platforms, that we employ. The new architecture will look obvious in hindsight, but what doesn&rsquo;t? It will hit the right balance of adjacency to existing technology, re-balancing of forces in tension, and mass appeal to become the next &ldquo;hot topic&rdquo; of books, conference talks, consultancies, etc. That doesn&rsquo;t mean it will be the ideal approach&hellip; every new solution has within it the seeds of the next problems. So whatever the successor architecture becomes, it will create new niches for tooling, supporting systems, languages, frameworks, and the like.</p>
<p>I&rsquo;ve tried to avoid cynicism about either the current state of affairs or the future. But I will confess that &ldquo;doing things right&rdquo; has never seemed less valued than it does now. In the push to have everybody shipping code, all the time, I worry that we are accumulating mountains of invisible coupling, fragile dependencies, uncontrolled production environments &hellip; it adds up to a lot of technical risk. We do not (yet) have platforms or processes to manage that risk. When the bill comes due, the blame will fall on the front line staff not the ones who set up the incentives and reaped the rewards.</p>
<!--  LocalWords:  CRO -->
<p><em>AI disclosure: All content human crafted, except for getting the LaTeX math syntax right.</em></p>
]]></content></entry><entry><title>Constrain the Provider to Liberate Callers</title><link href="https://michaelnygard.com/blog/2024/02/constrain-the-provider-to-liberate-callers/"/><id>https://michaelnygard.com/blog/2024/02/constrain-the-provider-to-liberate-callers/</id><published>2024-02-27T05:43:48-06:00</published><updated>2024-02-27T05:43:48-06:00</updated><content type="html"><![CDATA[<p>Back in the Before Times, I went to a Haskell-flavored FP conference,
where one of the speakers said something that blew my mind. Sadly, it
seems that I didn&rsquo;t write this up at the time (although I swear I
wrote it somewhere&hellip; maybe in an internal company memo) and I&rsquo;ve lost
the details of who said it. If by some quirk of odds, it was <em>you</em>,
dear reader, please let me know!</p>
<p>The speaker posed a question: Given the following function
signature&ndash;and no additional information at all&ndash;how many possible
implementations of <code>f</code> are there?</p>
<pre tabindex="0"><code>f :: a -&gt; a
</code></pre><p>For those who don&rsquo;t read Haskell, it says &ldquo;the function f takes an
argument of type <code>a</code> and returns a value of type <code>a</code>&rdquo;. We have no
information about <code>a</code> whatsoever, and that&rsquo;s the key to the
koan. Without any information about type <code>a</code>, <code>f</code> cannot apply any
operation to the value it receives. It cannot create a new instance of
<code>a</code> because it doesn&rsquo;t know how to construct it. It cannot modify the
value because it doesn&rsquo;t know any other function to apply. In fact,
the only possible implementation of <code>f</code> is <code>id</code>, the identity
function. If you give me an <code>a</code>, all I can do is give it back to you.</p>
<p>At first glance, this seems trivial or even tautological. But there&rsquo;s
something profound under the surface. If, like me, you come from the
dynamic side of FP or the world of OOP where you always have a base
class with some operations available, this might need a little bit of
unpacking. I don&rsquo;t want to turn this into a Haskell tutorial (I would
certainly get a lot of it wrong!) but a couple of ideas are necessary.</p>
<p>Suppose we wanted to say that a different function <code>g</code> could operate on any integer. We
would then have a different signature:</p>
<pre tabindex="0"><code>g :: Integer -&gt; Integer
</code></pre><p>In this case, <code>g</code> can take any possible integer into any other
possible integer. Now we know way less about the implementation. It
could increment or decrement its argument. It could add 42 to odd
values and subtract 99 from even ones. If we&rsquo;re working with 64 bit
integers, there are 18,446,744,073,709,552,000 implementations of this
function that just ignore their argument and return a constant!</p>
<p>We could also use typeclasses to indicate partial information about the values. Suppose we wanted to create a function <code>h</code> that accepts values which can be compared and placed in an ordering. That would look something like :</p>
<pre tabindex="0"><code>h :: (Ord a) =&gt; a -&gt; a
</code></pre><p>(I probably have the syntax wrong, but bear with me&hellip; it&rsquo;s the idea that is important here.)</p>
<p>This says that <code>h</code> can take any value of any type, as long as that
type is known to fulfill the requirements of the <code>Ord</code>
typeclass. <code>Ord</code> is fairly minimal, it just requires that the type
implements a few operations like &ldquo;less than&rdquo; and &ldquo;greater than&rdquo;. (It
also brings in <code>Eq</code>, but we don&rsquo;t need that at the moment.)</p>
<p>Let&rsquo;s get back to our original <code>f :: a -&gt; a</code>: By using the type
variable <code>a</code> with zero additional information about <code>a</code>, the compiler
enforces constraints on the implementer of <code>f</code>. <code>f</code> must accept any
value of any type whatsoever. <code>f</code> is therefore not allowed to know
anything about the arguments. It cannot make any assumptions about the
values it will be called with. In a sense, <code>f</code> is maximally
constrained because it has the least information possible about its
arguments.</p>
<p>On the flip side, a caller of <code>f</code> is maximally liberated. Because the
implementation cannot make any assumptions, it also means the
implementation cannot impose any constraints on the caller. The caller
can pass <em>any</em> value it wants to. The caller can use <code>f</code> in contexts
and situations that the implementer of <code>f</code> never dreamed of.</p>
<p>I think this generalizes to a principle: Constrain the provider to
liberate callers.</p>
<p>In my experience, elegant systems emerge when we have functions or
modules that can be used in multiple contexts. So we would like to
have multiple callers. But whatever functionality the provider offers
can only be used in contexts that meet its assumptions. Callers are
restricted to meeting the provider&rsquo;s assumptions. Therefore
assumptions in the provider constrain the caller. We should invert
this: limit the provider&rsquo;s ability to make assumptions thereby
allowing callers to use it in various situations.</p>
<p>This extends to services across an enterprise, too. The more a
provider is allowed to know about how it will be used, the narrower
the ability to reuse or recompose the services will be. Extend that
forward and backward along call chains in a distributed services
environment, and you will see inflexibility set in as every caller of
some service is itself a provider to others.</p>
<p>I think this is also linked to the phenomenon of single-use API
definitions, where an API is written for a specific point-to-point
interaction. The implementation inevitably makes too many assumptions
about how it will be used. So you get an environment with a
proliferation of APIs each with their own payload types.</p>
<p>Another related idea is <em>selective amnesia</em>. When designing an API to
offer, you can choose to temporarily forget what you know about the
caller. Instead think about how a second or third caller might want to
invoke your service. This leads to an API that is &ldquo;one notch more
abstract&rdquo; than you might otherwise design. (Before you shout YAGNI,
please recall that we must <a href="https://www.michaelnygard.com/blog/2018/04/evolving-away-from-entities/">weaken YAGNI across service
boundaries</a>.)
Selective amnesia can help constrain the provider from
making assumptions.</p>
]]></content></entry><entry><title>Rule of Eights</title><link href="https://michaelnygard.com/blog/2023/05/rule-of-eights/"/><id>https://michaelnygard.com/blog/2023/05/rule-of-eights/</id><published>2023-05-28T10:29:28-05:00</published><updated>2023-05-28T10:29:28-05:00</updated><content type="html"><![CDATA[<p>The &ldquo;rule of eights&rdquo; is a handy way to think about feedback cycle time and the effect it has on human attention spans. This is something I heard&ndash;and am probably misremembering to some extent&ndash;at an agile conference back in the day. I can&rsquo;t take credit for this but I also can&rsquo;t remember who I heard it from. If you know who came up with this, please contact me so I can properly attribute this.</p>
<p>(Edit: Dion Stewart points out that this resembles the <a href="https://www.nngroup.com/articles/powers-of-10-time-scales-in-ux/">Powers of Ten</a> article from <a href="https://www.nngroup.com/articles/author/jakob-nielsen/">Jakob Nielsen</a>. I&rsquo;ve read some of Nielsen&rsquo;s work, so it&rsquo;s entirely possible that I have misremembered and conflated that with some other thoughts on feedback.)</p>
<p>The basic idea is that the speed of feedback has a huge effect on a person&rsquo;s ability to stay in a flow state. The longer the feedback takes, the more likely the person is to context-switch onto some other activity.</p>
<h2 id="80-milliseconds">80 milliseconds</h2>
<p>Faster than human reflexes. Feels effectively instantaneous. Unlikely to break flow.</p>
<h2 id="800-milliseconds">800 milliseconds</h2>
<p>A noticeable hitch or pause. Enough to be annoying while typing, but not overly irritating when running a command. Reasonable for an &ldquo;execute on save&rdquo; command like linting or formatting. Unlikely to break flow.</p>
<h2 id="8-seconds">8 seconds</h2>
<p>A discernable pause. Cannot be a continuous part of workflow, but might be a kind of punctuation between tasks. Will cause thoughts to wander. May cause alt-tabbing over to check email.</p>
<h2 id="80-seconds">80 seconds</h2>
<p>Annoying. Enough to get bored and provoke tab-switching, with high likelihood of lost flow.</p>
<h2 id="8-minutes">8 minutes</h2>
<p>This is a coffee break. Certain to break flow. A diligent dev may alt-tab to HN with the intent to come back when the job is done (but will probably get diverted into other work until long after the 8 minutes is over.) Will cause the developer to find ways to avoid incurring this. If this is your test execution time, tests will degrade as they are avoided during most dev work.</p>
<h2 id="80-minutes">80 minutes</h2>
<p>Flow is a long-lost dream. This is a lunch break. Devs will organize their day around <em>not</em> having to run this job. They get basically two runs per day at this pace: probably once in the morning and once in the afternoon and the afternoon job may be kicked off before heading out the door.</p>
<h2 id="8-hours">8 hours</h2>
<p>Flow? What flow? This is something the dev starts before leaving for the day. Everything is scheduled around this execution time. The job is very likely to break overnight, and devs will invent ways to &ldquo;recover&rdquo; a broken build via monkey-patching, hotfixing, etc. This is waste on top of waste, but feels more &ldquo;right&rdquo; than incurring another day of delay. The underlying job will get more and more flaky as people come to rely on spackle and grout instead of fixing the foundation.</p>
<h2 id="8-days">8 days</h2>
<p>Effectively infinite. Jobs very likely to fail. Flow is irrelevant. This is the domain of researchers, prisoners, and postdocs.</p>
]]></content></entry><entry><title>The Bad Idea Game</title><link href="https://michaelnygard.com/blog/2023/05/the-bad-idea-game/"/><id>https://michaelnygard.com/blog/2023/05/the-bad-idea-game/</id><published>2023-05-02T06:34:46-05:00</published><updated>2023-05-02T06:34:46-05:00</updated><content type="html"><![CDATA[<p>About ten years ago, I was introduced to something called &ldquo;The Bad
Idea Game&rdquo; by Danvers Fleury. We were doing a company strategy
retreat. Fortunately we did not spend it all on wordcrafting mission
and values statements, and we actually engaged in some good strategy.</p>
<p>The bad idea game was a fun exercise that didn&rsquo;t seem to produce any
directly useful results. At first I thought it had been a waste of a
precious hour from our limited supply. Afterwards, however, I noticed
that we were thinking more broadly and considering more creative
options.</p>
<p>Since then, I have used the Bad Idea Game occasionally when it seems
that groupthink has emerged or people have become circular in their
thinking. It helps break out of those recursive loops.</p>
<p>It is a facilitated exercise that starts by posing a challenge.</p>
<p>&ldquo;Instead of hitting the problem head-on, let&rsquo;s go a different
direction. For the next hour, we&rsquo;re going to think of the worst, most
value-destroying ideas we can. For example, instead of thinking about
how to improve your company image you would think of ideas for
<em>absolutely demolishing</em> your company image. Things like: put out a
video of you murdering puppies on main street. We&rsquo;re going to do this
because sometimes we can find good ideas in the &lsquo;shadow&rsquo; of bad
ones by reversing them or combining their reversals. &quot;</p>
<p>You&rsquo;ll get some (nervous) laughter at first and people will need some
prodding. Use your usual facilitation tricks and tools (having a plant
in the audience always helps). After a couple of minutes to explain
the silliness of the exercise, have people spend five or ten minutes
writing ideas on sticky notes.</p>
<p>Put the notes up on the board and let people riff on them for a while.</p>
<p>Depending on the group and the psychological safety in the company it
might be best to <em>not</em> write down the bad ideas.</p>
<p>Why is this a useful exercise? I think there are two ways this
helps. One thing that doesn&rsquo;t usually result is directly producing a
good idea in that session.</p>
<p>First, as I mentioned before, it can break people out of their habits
of thought. It engages some creative faculties.</p>
<p>The second, more subtle effect that I didn&rsquo;t appreciate until later,
is that framing things as bad ideas gives you license to start <em>naming
elephants</em>. In any company there are some things you just <em>don&rsquo;t talk
about</em>. Sensitive areas. Political hot buttons. Toes on which you must
not step. Deeply ingrained assumptions. And sometimes, those things
are holding you back. <em>Especially</em> if those deeply held beliefs are
the very things that made you successful so far. And doubly so if your
company isn&rsquo;t good at self-reflection and confronting its own sacred
cows.</p>
<p>If you&rsquo;re going to call the CEO&rsquo;s baby ugly, it&rsquo;s better to do in
under the cover of a bad idea exercise than to come right out and say
it.</p>
]]></content></entry><entry><title>Everything We Build Has a Future Cost</title><link href="https://michaelnygard.com/blog/2023/04/everything-we-build-has-a-future-cost/"/><id>https://michaelnygard.com/blog/2023/04/everything-we-build-has-a-future-cost/</id><published>2023-04-16T10:53:29-05:00</published><updated>2023-04-16T10:53:29-05:00</updated><content type="html"><![CDATA[<p>Suppose we build a road. If we build it road and walk away, it will
decay into a hazard before long. It will be scoured by wind, rain, and
sand. Ultraviolet rays from the sun will break down its molecular
structure. The shifting earth beneath will crack and buckle it.</p>
<p>We must maintain what we build, and that requires expense.</p>
<p>Suppose we decide that the road is no longer needed or that it costs
more to maintain than it is worth. There is expense to <em>removing</em> what
we have built, too.</p>
<p>We cannot just close it and leave it alone. We must divert traffic off
of that road to others, which may require some incremental new road
construction to make connections. If people have moved into
neighborhoods on that road, we must build new routes to connect their
homes&ndash;or we must relocate them. If there are businesses, we must
(somehow) deal with the owners. Once we&rsquo;ve reduced usage to
nothing&ndash;which can take years&ndash;we must tear up the asphalt or
concrete, haul it away (to where?), and restore the land where the
road sat.</p>
<p>Let&rsquo;s consider a smaller case. Consider the humble birdfeeder, a
trivial construct. You buy it, hang it, and fill it. The air fills
with birdsong and you fill with joy. Of course, the birds eat the feed
so you must replenish it. That&rsquo;s the obvious future cost. Less obvious
is that the plastic will degrade and break, and you will eventually
replace the whole unit. It may not seem like much to hang the new unit
in the old one&rsquo;s place, but it does cost you time. For the aged or
infirm, it may have a capital cost if a handyman is needed. Discarding
the old feeder means throwing it in the trash&ndash;which you pay to haul
away&ndash;or recycling&ndash;which you or your local government pay for as well.</p>
<p>It would be wise to consider the future cost when you undertake a
project to build something. Maintenance, replacement, switching cost,
disassembly, and removal&hellip; when you account for all of those, perhaps
the present value of the project isn&rsquo;t as appealing as it appeared.</p>
]]></content></entry><entry><title>Four Meanings of Priorities</title><link href="https://michaelnygard.com/blog/2023/04/four-meanings-of-priorities/"/><id>https://michaelnygard.com/blog/2023/04/four-meanings-of-priorities/</id><published>2023-04-12T07:20:11-05:00</published><updated>2023-04-12T07:20:11-05:00</updated><content type="html"><![CDATA[<p>When trying to communicate, we sometimes use the same word thinking
that it means the same thing to everyone. But words are slippery,
multivalent things. I can speak a word with one meaning and you might
hear it with another. The result is the illusion of communication.</p>
<p>As a leader you must be aware that your words can be taken in
different ways. In one kind of culture, people might look for the most
sinister possible interpretation and assume that&rsquo;s what you <em>must
have</em> meant. Even in a healthy culture, it is possible for you to say
the exact same thing to two different people and they still end up
disagreeing about your intended message.</p>
<p>I&rsquo;ve been thinking about the word &ldquo;priority&rdquo; and its multiple
meanings. It&rsquo;s a word that comes up frequently. Everyone in your
organization deserves a clear understanding of priorities and how
their work connects to the organization&rsquo;s goals. However, I&rsquo;ve
identified at least four distinctly different meanings of &ldquo;priority&rdquo;.</p>
<p><strong>Sequence</strong></p>
<p>Priority-as-sequence means we will do all of priority 1, then all of
priority 2, and so on. In a small team or startup, this might be the
only useful definition.</p>
<p>It is the sense of a recipe: chopping vegetables is the first
priority, then you can saute them.</p>
<p><strong>Allocation</strong></p>
<p>As allocation, &ldquo;priority&rdquo; means we will spend the majority of our time
on the top priority, a lesser amount on the second priority, and so
on. This might be fractions of a week for a single team, or it might
be allocation of headcount across an organization.</p>
<p>This is the sense we mean when we say &ldquo;health is a priority&rdquo; or
&ldquo;family is a priority&rdquo;. We don&rsquo;t literally plan to &ldquo;finish all our
health first, then all our family&rdquo;. Instead it is a statement about
how we intend to spend our time.</p>
<p><strong>Trade-offs</strong></p>
<p>In this sense, a list of priorities is a pre-decided ordering of
trade-offs. When priority 1 and priority 2 are in tension, we know to
make the trade-off in favor of priority 1. This somewhat intersects
with priority-as-allocation, if the trade-off in question is &ldquo;where do
I devote my time&rdquo;. However it is distinct when the question at hand is
something like &ldquo;do I optimize for performance or cost?&rdquo;</p>
<p>The image here is buying a car. You might have &ldquo;appearance&rdquo; as a
priority over &ldquo;reliability&rdquo; or vice versa.</p>
<p><strong>Scope</strong></p>
<p>Priority can also mean &ldquo;the boundary of what I care about at all.&rdquo; The
list of priorities gives you permission to say &ldquo;no&rdquo; to other
demands. If your organization&ndash;like so many others&ndash;is drowning under
excess WIP and a years-long backlog, people will eagerly adopt this
meaning of priority. (Even if what you meant was &ldquo;allocation&rdquo; or
&ldquo;trade-offs&rdquo;.)</p>
<p>The image here is a backpack for hiking. There are only so many things
that will fit into it. Bringing a tennis racket is probably not a
priority.</p>
<p> </p>
<p>As with any question of definitions, none of these is more right than
others. The key is to make sure you have a shared understanding so
that everyone has clarity and can work toward the same purpose.</p>
]]></content></entry><entry><title>Transactions Aren't Everything</title><link href="https://michaelnygard.com/blog/2023/04/transactions-arent-everything/"/><id>https://michaelnygard.com/blog/2023/04/transactions-arent-everything/</id><published>2023-04-06T00:00:00+00:00</published><updated>2023-04-06T00:00:00+00:00</updated><content type="html"><![CDATA[<p>When building an application, we tend to select a database technology
based on its transactional characteristics. We consider raw
performance, API style, consistency model, data model, and deployment
architecture. That&rsquo;s about as much as your service cares about: can it
meet the functional and non-functional requirements for the production
behavior of the service?</p>
<p>Even in a microservice architecture where no other <em>application</em> is
allowed to access the service&rsquo;s database, that database probably has a
bunch of other <em>clients</em>. You may not think about these when making your
selection, but how well or badly your database supports them has a big
effect on its eventual success.</p>
<ol>
<li><strong>ETL connectors</strong>. Your company has some kind of ETL or ELT to feed bulk data from your service to analytical processing.</li>
<li><strong>PII Discovery and Classification Tools</strong>. These tools are part of data governance and will be a horizontal capability. They look through your schema and samples of your data to see if there is undeclared PII.</li>
<li><strong>Backup and Recovery</strong>. Whether this is something like cross-datacenter replication or &ldquo;cold&rdquo; storage of backup files, everybody&rsquo;s got one. (And by the way, have you tested your backups lately?)</li>
<li><strong>Query Optimization</strong>. Your service doesn&rsquo;t need this but you do. (Especially in the cloud where performance equals savings!)</li>
</ol>
<p>The horizontal tools are probably licensed at an enterprise
level. Your company has to pay for the connectors to each different
type of database in use. So if you&rsquo;re the only user of a particular DB
technology, that connector cost (whether license fee or labor to build
it) is part of the cost of choosing that technology. It&rsquo;s not a
trivial component of the overall price tag, so make sure you
understand the full cost/benefit equation when making your choices.</p>
]]></content></entry><entry><title>Counterfactuals are not Causality</title><link href="https://michaelnygard.com/blog/2021/06/counterfactuals-are-not-causality/"/><id>https://michaelnygard.com/blog/2021/06/counterfactuals-are-not-causality/</id><published>2021-06-19T16:27:50+00:00</published><updated>2021-06-19T16:27:50+00:00</updated><content type="html"><![CDATA[<p>Suppose we&rsquo;ve had a recent error with a Kubernetes cluster. As often happens with a problem in our systems, we noticed it first in terms of the visible error, which we could state as &ldquo;<strong>Builds did not complete.</strong>&rdquo; Now we want to trace backwards to figure out what happened. A common technique is the &ldquo;Five Whys&rdquo; popularized by Lean thinking. So we ask &ldquo;Why did builds not complete&rdquo; and we find &ldquo;Kubernetes could not start the pod, and the operation timed out after 1 hour.&rdquo;</p>
<p>We could certainly debate whether that&rsquo;s a single &ldquo;why&rdquo; or two of them in one step, but that&rsquo;s not the key topic right now. The main thing is that this is a straightforward statement about causality. &ldquo;Pod no start&rdquo; leads directly to &ldquo;build no done.&rdquo;</p>
<p>The next step in this analysis reveals that the pod would not start because a volume was full with too many files. Again, direct causality.</p>
<p>The tricky bit comes next. Why was the volume full with too many files? At this point, we&rsquo;re likely to see a change in the nature of the explanation. Some variation of the following might be offered:</p>
<ul>
<li>The admin did not configure file purging.</li>
<li>The cluster admins did not monitor for &ldquo;volume full&rdquo; conditions.</li>
<li>The developers did not clean up files from old builds.</li>
</ul>
<p>Do you notice how all these &ldquo;causes&rdquo; are stated in the form of something that didn&rsquo;t happen? They are &ldquo;counterfactuals.&rdquo;</p>
<p>A counterfactual is a statement about how the world might be different now if something had happened differently in the past. It&rsquo;s a kind of &ldquo;alternate history&rdquo; idea.</p>
<p>Here&rsquo;s the rub: a counterfactual <strong>cannot</strong> be a cause. By definition the counterfactual did not happen, therefore it cannot have caused anything. Only events that actually occur can be causes of other events. Causality should be stated in a form &ldquo;Because X then Y&rdquo;. The statement &ldquo;If not X then not Y&rdquo; is not an explanation, it is a kind of wishful thinking about how the past might have unfolded differently.</p>
<p>When performing Five Whys it is important to avoid this counterfactual leap. Stick to the events that actually occurred.</p>
<h2 id="unlimited-counterfactuals">Unlimited Counterfactuals</h2>
<p>Notice in the incident analysis I outlined earlier, there are three counterfactuals listed. Each of them independently would have been sufficient to avert the incident. But these are hardly the only three counterfactuals we could construct:</p>
<ul>
<li>We used Kubernetes for our CI cluster instead of static VMs.</li>
<li>We use CI instead of a human working at the command line.</li>
<li>We put code in a repository instead of directly editing files on production instances.</li>
</ul>
<p>I could go on, but you probably felt like the first three were somehow more reasonable than these three. In some way, the original set are &ldquo;closer&rdquo; to actual reality than these three. Nonetheless, I could go on constructing counterfactuals for an unlimited period of time. &ldquo;If the Earth hadn&rsquo;t been habitable then we would not be here to care about our CI builds not finishing.&rdquo; Once you start making counterfactuals, there&rsquo;s really no end to them. Again, that&rsquo;s because these are not events that happened. Only a finite number of events actually happened so the chain of causality is finite. An infinite number of things didn&rsquo;t happen so we can always find more &ldquo;missing things&rdquo; to blame.</p>
<h2 id="speaking-of-blame">Speaking of Blame</h2>
<p>This is also where people come into conflict when analyzing the chain of events. One person might posit a counterfactual about an event a different person or team didn&rsquo;t do. That person or team naturally bristles&ndash;it feels like they are being blamed. (And worse, being blamed for <strong>not</strong> doing something, so they are being called negligent!) They would be impelled to put forward their own counterfactual which might haul in yet another team. If the negative outcome was significant, this cloud of hypotheticals becomes a &ldquo;blamestorm&rdquo; looking to rain down on somebody. Defenses go up, and learning stops.</p>
<p>Counterfactuals are the condensation nuclei for blamestorms.</p>
<h2 id="using-counterfactuals-for-good">Using Counterfactuals For Good</h2>
<p>The counterfactual leap indicates where people stop looking for causes and jump to thinking about solutions. Try to reformulate the counterfactual as a statement about future prevention:</p>
<ul>
<li>If we configure file purging, then this won&rsquo;t happen again</li>
<li>If we monitor for &ldquo;volume full&rdquo; conditions, then this won&rsquo;t happen again</li>
<li>If we clean up files from old builds, then this won&rsquo;t happen again.</li>
</ul>
<p>These are useful statements. When formulated this way, they&rsquo;re clearly talking about the future and not hypothesizing an alternate history. (You might have noticed that I also snuck in a bunch of &ldquo;we&rdquo; statements in place of the more specific attributes above.)</p>
<p>As long as we remain clear that these counterfactuals are not the <strong>cause</strong> of the problem that already happened, but are changes to our reality that can prevent <strong>future</strong> occurrences, we can use them without inducing blamestorming.</p>
<p>As a practical technique, during a Five Whys or post-incident review, when someone poses a counterfactual as a cause, I suggest capturing it in the forward-looking version in a parking lot of potential changes.</p>
<h2 id="stepping-farther-away-from-reality">Stepping Farther Away From Reality</h2>
<p>This reformulation also helps weed out the more far-fetched conterfactuals&hellip; the ones that felt kind of &ldquo;out there&rdquo; or even silly before. Let&rsquo;s try it with the second set from above:</p>
<ul>
<li>If we use static VMs instead of Kubernetes for our CI cluster, then this won&rsquo;t happen again. (Possibly true statement, though somewhat lacking in support.)</li>
<li>If we use a human working at the command line instead of CI, then this won&rsquo;t happen again. (Probably. Humans are more adaptable and can figure out when to purge files. But there are likely to be other undesirable effects.)</li>
<li>If we edit files directly on production instances instead of putting code in a repository, then this won&rsquo;t happen again. (Umm&hellip; definitely a case where the cure is worse than the disease!)</li>
</ul>
<p>This last one also lets me illustrate something about the counterfactuals from before. You might have felt more resistance to the second set because you were automatically thinking about negative consequences if that statement had been true. Humans are very good at hypothesizing these counterfactuals. Faced with a bad outcome, our brains spontaneously and instantly conjecture a large branching tree of alternate histories. And just as quickly, we prune that tree of those branches which we know would produce other negative effects that are <strong>worse</strong> than the outcome we had. Just imagine, &ldquo;If we provoke a nuclear war that ends civilization, then this CI build failure won&rsquo;t happen again.&rdquo;</p>
<p>So when I pose a counterfactual that says &ldquo;if we edit files directly on production instances, this won&rsquo;t happen again,&rdquo; your instinctive response is to say, &ldquo;yeah, <strong>but</strong>.&rdquo; This is now thinking about two steps away from the current reality. Step 1 is to imagine the alternative history where the counterfactual had occurred. Step 2 is to extrapolate the negative outcome of the consequences of that alternative history. Sometimes we even go further steps away from reality by postulating still more counterfactuals that could compensate for the negative consequences of the first one.</p>
<h2 id="conclusions">Conclusions</h2>
<p>Counterfactuals don&rsquo;t say anything about what actually happened. They express wishful thinking about an alternate history where the bad event <strong>didn&rsquo;t</strong> happen. Because they represent &ldquo;events that didn&rsquo;t occur&rdquo; they cannot have caused anything. However, <strong>stating</strong> a counterfactual can trigger an unhelpful round of blamestorming. Try to reformulate counterfactuals offered as explanations for past events so you can state them as injections to prevent recurrence. Of course, you must also contemplate what other effects those injections would have!</p>
<p>Watch out for the pitfall of counterfactuals when analyzing anything. It&rsquo;s a common trap for post-incident reviews, retrospectives, project post-mortems, and other cases when you need to reconstruct a chain of events.</p>
]]></content></entry><entry><title>"Manual" and "Automated" are just words</title><link href="https://michaelnygard.com/blog/2020/10/manual-and-automated-are-just-words/"/><id>https://michaelnygard.com/blog/2020/10/manual-and-automated-are-just-words/</id><published>2020-10-15T13:35:45-05:00</published><updated>2020-10-15T13:35:45-05:00</updated><content type="html"><![CDATA[<p>Driving down a shady road, windows down, listening to the frogs and crickets, my family was in the car talking about various stuff and things. This summer evening we happened to talk about the invention and emergence of the word &ldquo;yeet.&rdquo; I observed that it was kind of cool to have a word with a known origin and etymology, even if that was only because it was a made-up word.
My daughter instantly responded that &ldquo;all words were made up by someone.&rdquo;</p>
<p>What could I say? Of course it&rsquo;s true!</p>
<p>I&rsquo;ve previously talked about the difficulty that words present. In 2015 I discussed <a href="https://www.michaelnygard.com/blog/2015/04/the-perils-of-semantic-coupling/">the perils of semantic coupling</a> that could emerge when we get fooled by nouns. The existence of a noun makes us think we understand a concept. Once we try to define a predicate to answer &ldquo;Is X an instance of Y?&rdquo; for any noun Y it becomes difficult, verging on impossible, to find a categorical statement. Instead we fall back to the Potter Stewart method.</p>
<p>In <a href="https://www.michaelnygard.com/blog/2016/07/wittgenstein-and-design/">Wittgenstein and Design</a> (say that three times fast) I talked about pursuing adjectives instead of nouns as a way to carve a design space.</p>
<p>Today, I want to talk about how we use words as signifiers for their semiotic content. In particular, the words &ldquo;manual&rdquo; and &ldquo;automated.&rdquo;</p>
<h1 id="two-legs-good-four-legs-bad">Two-legs good, four-legs bad</h1>
<p>We are now ten years into the DevOps era. Among both practitioners and adopters, there is a tendency to use &ldquo;automated&rdquo; as a pseudo-synonym (psynonym?) for &ldquo;good&rdquo; while &ldquo;manual&rdquo; stands in for &ldquo;bad.&rdquo; The trouble is that the closer you look the harder it gets to tell whether any particular thing is manual or automated!</p>
<p>Suppose we are in an incident. I invoke the &ldquo;break glass&rdquo; process to ssh into a server to run a bash script. Was that manual or automated? Well, both, sort of.</p>
<ul>
<li>We are in an incident&hellip; probably initiated without human intervention based on monitoring systems that detected a triggering condition.</li>
<li>I invoke the break glass process&hellip; wait a second. How did I even get involved? Maybe the systems notified me directly via PagerDuty. That would have no human intervention. Or maybe our operations center decided to escalate to level 3 support, and I&rsquo;m the on-call this week. In the second case, a human decided the escalation was required and clicked a button in ServiceNow. ServiceNow then used a database to contact me. Was that manual? Automated? Semi-automated?</li>
<li>I invoke the break glass process&hellip; wait another second. Once I&rsquo;m involved, I have to bring information into my head. That information came from humans and systems. I have to then decide a course of action. I guess we&rsquo;d call that manual? (Although &ldquo;manual&rdquo; derives from Latin &ldquo;manus&rdquo; which means hand powered, not brain powered.) Invoking the break glass process is an action in a system that I trigger by entering a rationale and clicking a button.</li>
<li>to ssh into a server&hellip; entirely facilitated by the systems.</li>
<li>to run a bash script&hellip; does a bash script count as automated? Or is it manual because I had to invoke the script? What if there&rsquo;s no script but a wiki page with a list of commands I keystroke each time? Sounds more manual, but I&rsquo;m still invoking tools that already exist.
At some level, everything above <a href="https://www.youtube.com/watch?v=Sr9mmsLQmYs">toggling a program</a> is automated.</li>
</ul>
<h1 id="out-of-the-morass">Out of the Morass</h1>
<p>Instead of applying a blanket statement like &ldquo;manual&rdquo; or &ldquo;automated&rdquo;, we should look more closely. Specifically, what actions are being executed by which people or systems via which tools in response to which stimuli.</p>
<p>When we engage with detail at that level we can begin to ask and answer more useful questions than &ldquo;is it automated&rdquo;. For example:</p>
<ul>
<li>How long does it take from the stimulus to the action? Bear in mind that <a href="https://en.wikipedia.org/wiki/Pilot-induced_oscillation">shorter is not always better</a>.</li>
<li>What is the probability of error in performing the action? Toggling in that 1401 program&hellip; pretty high probability of error. Running a bash script&hellip; low probability of error. (But that probability rises geometrically with each argument to the script!)</li>
<li>What judgement or decision-making is required to choose an action in response to a stimulus? As we build ever-more-powerful levers to move our systems, and particularly as we give our systems their own internal feedback loops through the control plane, we need to think of them more like cybernetic systems. (Think about PID controllers, Kalman filters, inertial models, or creating a radar track from a series of intermittent &ldquo;blips.&rdquo;)</li>
</ul>
<p>Breaking the question down this way won&rsquo;t help us answer whether something is &ldquo;automated&rdquo; or &ldquo;manual.&rdquo; But it will help us answer how likely the process is to deliver availability, stability, security; or conversely, how likely it is to amplify noise, create oscillation, or induce drag.</p>
]]></content></entry><entry><title>Blocker? Pre-requisite.</title><link href="https://michaelnygard.com/blog/2020/09/blocker-pre-requisite./"/><id>https://michaelnygard.com/blog/2020/09/blocker-pre-requisite./</id><published>2020-09-22T10:27:07-05:00</published><updated>2020-09-22T10:27:07-05:00</updated><content type="html"><![CDATA[<p>In discussions about change in a complex system I commonly hear people object, “We can’t do that because X.”</p>
<p>(That statement often follows a passive-aggressive prelude such as “That’s all well and good” or “being tactical for a moment.” Depending on your organizational culture you may also hear “That’s great in theory&hellip;” Or if your company is more aggressive-aggressive, &ldquo;Get real!&rdquo;)</p>
<p>My advice is to reformulate that statement. Treat the blocker as a missing prerequisite: “In order to do that, X must be true. Let’s see what it would take.”</p>
<p>At that point, you may find “We can’t do X because Y.” Keep going, turn Y into a prerequisite for X. As you continue this process, you’re building out a “future reality tree”. It is a tree of preconditions that ultimate arrive at a desirable effect.</p>
<p>Now comes the really hard part. You have to scrutinize the resulting tree. I recommend using the “Categories of Legitimate Reservation” that emerged from <a href="https://www.tocinstitute.org/eliyahu-goldratt.html">Eliyahu Goldratt</a>’s work on the <a href="https://www.tocinstitute.org/theory-of-constraints.html">Theory of Constraints</a>.</p>
<p>Once you’re satisfied that the tree is a true depiction of the preconditions, you need to get brutally honest and look for unintended consequences. For every precondition in your tree, ask “what <strong>other</strong> effects will result.” Those effects are <a href="https://www.michaelnygard.com/blog/2020/06/consequences-are-not-pros-or-cons/">consequences</a>. You must add those to your tree, otherwise you only consider benefits not costs or drawbacks.</p>
<p>When you’re done with this tree, you need to evaluate it. Does the net result of all the consequences produce a better outcome than the situation you’re in? Are the actions needed to create the preconditions possible? Feasible?</p>
<p>If you’ve truly captured the prerequisites and consequences, then people who both support the changes and dislike the changes should be able to agree on the truth of the tree. If not, you are either missing preconditions, disagree about the likelihood of the consequences, or you are working from different sets of axioms.</p>
]]></content></entry><entry><title>Delay Induces Lamination</title><link href="https://michaelnygard.com/blog/2020/09/delay-induces-lamination/"/><id>https://michaelnygard.com/blog/2020/09/delay-induces-lamination/</id><published>2020-09-21T00:00:00+00:00</published><updated>2020-09-21T00:00:00+00:00</updated><content type="html"><![CDATA[<p>I’ve seen a repeated pattern that plays out in many companies. Delay, or more accurately, the perception of delay induces the creation of “extra” layers in the architecture. The pattern goes like this:</p>
<ol>
<li>A component or subsystem needs to add a capability to serve some end-user need.</li>
<li>It will take &ldquo;too long&rdquo; to implement that capability in the component. (This is where the perception part really steps in.) Maybe the team is stretched too thinly. Maybe the capability is low value relative to the rest of the pipeline and gets scheduled out in the future. Maybe the team has a lot of technical debt to contend with. Or maybe it just really does take a long time to implement in a particular layer.</li>
<li>The requestor then moves up the call stack and looks for a component at a layer closer to the end user, so the capability can be added there. Often this means introducing a new layer between the end user and the &ldquo;slow&rdquo; component.
This might be a kind of strategic maneuver to engulf and extinguish the other component. In a strongly political environment, you will see this play out as executives jockey for position against each other.
It might be a good faith effort to create a new &ldquo;orchestration&rdquo; layer to bring together diverse capabilities.</li>
<li>If the effort succeeds, there is a loss of coherence: the new layer never implements the same interface as the one it decorates. So callers must decide which layer to invoke. Behavior differs. Maybe even the data available differs.</li>
<li>If there is more than one community of end users, they are likely to pick different layers to interact with. Legacy users may prefer to stick with calls to the lower layer, as they see only cost and no benefit to switching. Newer users may prefer the newer layer, especially if it’s interface style is more contemporary.</li>
</ol>
<p>The net result is:</p>
<ul>
<li>Increase in complexity and technical debt.</li>
<li>Increase in &ldquo;organizational debt&rdquo; (measured by the number of teams needed to effect a user-visible change.)</li>
<li>Customer frustration once they experience channel disparity.</li>
</ul>
]]></content></entry><entry><title>Complexity Collapse</title><link href="https://michaelnygard.com/blog/2020/09/complexity-collapse/"/><id>https://michaelnygard.com/blog/2020/09/complexity-collapse/</id><published>2020-09-20T11:53:00-05:00</published><updated>2020-09-20T11:53:00-05:00</updated><content type="html"><![CDATA[<p>There&rsquo;s a pattern I&rsquo;ve observed a few times through scientific and computing history. I think of it as &ldquo;complexity collapse&rdquo;. It&rsquo;s probably related to Kuhn&rsquo;s <a href="https://en.wikipedia.org/wiki/Paradigm_shift">paradigm shift</a>.</p>
<p>The pattern starts with an approach that worked in the past. Gaps in the approach lead to accretions and additions. These restore the approach to functionality, but at the expense of added complexity.</p>
<p>That added complexity at first appears preferable to rebuilding the approach from the ground up. Eventually, however, the tower of complexity becomes impossible to extend further. At this point, the field is ripe for a complexity collapse and replacement with a fundamentally different approach.</p>
<p>In the realm of science, this complexity collapse has led to the most famous reformulations in history:</p>
<ul>
<li>
<p>Ancient astronomers assumed that the heavens were perfect. The stars were permanently fixed to a sphere, except for the &ldquo;wanderers.&rdquo; Planets and our moon, being heavenly bodies, must move in circles. The fly in the ointment was that circles alone could not explain apparent retrograde motion. Hipparchus and Ptolemy believed the explanation to be <a href="https://en.wikipedia.org/wiki/Deferent_and_epicycle">epicycles</a> &ndash; circles superimposed on circles. Eventually, Copernicus showed that the number of epicycles needed would be drastically reduced with a heliocentric model. However, further improvements in optics and measurements caused the epicycles to proliferate again.</p>
</li>
<li>
<p>Kepler swept the epicycles away with a clean, simple explanation: orbits are ellipses. His three laws, derived with the aid of Tycho Brahe&rsquo;s incredibly accurate observations, described the motion of all the wanderers with a single explanation. Newton later showed the inverse square law of gravity would produce those ellipses. Newton <strong>could not have</strong> discovered the universal law of gravitation in the paradigm of epicycles.</p>
</li>
<li>
<p>Near the end of the nineteenth century, physicists faced a similar tower of complexity when it came to explaning <a href="https://en.wikipedia.org/wiki/Black-body_radiation">black-body radiation</a> spectra. All the existing models for light, heat, and emission predicted much higher energy radiation at high frequencies than was actually observed. This would imply something called the &ldquo;ultraviolet catastrophe&rdquo; (which should absolutely be your next band name). It meant the night sky should be blazing with hard ultraviolet light. Not just that, but the farther you looked to the high frequency end of the spectrum, the higher the energy you would find. In other words, a black-body radiator could produce nearly infinite energy by just sitting there.</p>
</li>
<li>
<p>As with the epicycles, the first response was to add adjustments to fix the ultraviolet catastrophe within existing equations of classical mechanics. Many such models were created by theorists whose names are only known to students of science history today. They all focused on adding corrective terms&ndash;based on unknown mechanisms&ndash;to the high end of the spectrum.</p>
</li>
<li>
<p><a href="https://en.wikipedia.org/wiki/Max_Planck">Max Planck</a> showed instead that the entire observed spectrum could be explained with one simple law. It only required that light came in particles (later called &ldquo;photons&rdquo;) whose energy depended on their <strong>wavelength</strong> rather than their mass times velocity. Planck swept away the complexity of the old model, replaced it with a simple set of equations, and laid the foundation for quantum mechanics.</p>
</li>
<li>
<p>In the Java programming world, the challenge of building data-based shared systems with HTML front ends led a collection of vendors (virtually all gone now, absorbed into either Oracle or IBM) to create the &ldquo;Java Enterprise Edition&rdquo; specifications including the notorious Enterprise JavaBeans (EJB). This built on a tower of complex specifications for remote invocation and activation, interface descriptions, several roles that didn&rsquo;t exist before (or since.) This stack could indeed allow programmers to create HTML based applications with a database hiding in the shadows.</p>
</li>
<li>
<p>The Spring framework emerged as an alternative that focused instead on &ldquo;plain old Java objects&rdquo; (POJOs). It replaced complex interactions across development time, configuration time, deployment time, and run time with a simple model: objects that could be &ldquo;injected&rdquo;. Then it offered a collection of libraries that included classes useful for building the kind of applications developers needed to build.</p>
</li>
</ul>
<p>Admittedly, the examples from astronomy and quantum physics are more fundamental to our understanding of the universe than XML-based dependency injection. But these examples all illustrate a similar dynamic. Complexity accumulates, a new theory replaces the old one, leading to complexity collapse.</p>
<p>All those examples include a common coda as well: complexity grows again!</p>
<ul>
<li>
<p>The elliptical orbits of Kepler and Newton work when bodies are far from very massive objects. There is a &ldquo;correction&rdquo; needed. That correction was predicted in 1915 and observed in 1919 in what may be the only planetary occultation to reach the front page of the New York Times. The corrected theory neatly explained the same simple elliptical orbits. It also predicted that Mercury, being very close to our Sun, would exhibit orbital precession because it made a simple ellipse on curved space (a geodesic ellipse.) That new theory explained the earlier results along with some new ones&hellip; but we did have to fundamentally change our view of space and time, and open the door to black holes, the twins paradox, and a host of other counterintuitive (but verified!) phenomena. Einstein&rsquo;s beautiful equation fits on a single line of one page&hellip; but it takes a stack of books on one side of you to understand the equation, and a taller stack of books on the other side to explore the incredible implications.</p>
</li>
<li>
<p>Planck&rsquo;s law is simple, but nobody would make the same claim about the quantum mechanics it ushered in. The more we pursued Planck&rsquo;s implications, the weirder our universe got.</p>
</li>
<li>
<p>Of Spring, many people have quoted Harvey Dent from &ldquo;The Dark Knight&rdquo;&hellip; “You either die a hero or live long enough to see yourself become the villian.”</p>
</li>
</ul>
<p>Today, the contradictions between quantum mechanics and general relativity lead many physicists to look for a new model. Not adjustments but a new paradigm to sweep away and unify the towers of complexity in both fields.</p>
<p>In the realm of data-based applications with web-ish interfaces, complexity collapse led many to embrace Ruby on Rails or Node.js. Both ecosystems have had mini-collapses but no complete replacement, yet.</p>
]]></content></entry><entry><title>Staggering Skeleton</title><link href="https://michaelnygard.com/blog/2020/09/staggering-skeleton/"/><id>https://michaelnygard.com/blog/2020/09/staggering-skeleton/</id><published>2020-09-19T20:09:53-05:00</published><updated>2020-09-19T20:09:53-05:00</updated><content type="html"><![CDATA[<p>We&rsquo;ve talked before about a walking skeleton. That is a fully connected, but not very functional, system that includes all the major integrations. It serves to demonstrate that <strong>anything</strong> at all can run in the expected topology.</p>
<p>But some languages and frameworks ask you to get more correct to form a walking skeleton. Strongly typed languages, frameworks that require you to run from a non &ldquo;-SNAPSHOT&rdquo; library version, deployment tools that only fetch from official repositories, etc.</p>
<p>On the other hand, some languages and ecosystems let you put a bunch of half-broken, mostly inconsistent crap together and it still <strong>kind of</strong> works. There&rsquo;s a good chance that the copy-and-pasted monstrosity will fall apart if you poke it with unexpected input. And it might blow up, crash, or delete the contents of your high school permanent record. But it still works just enough to say &ldquo;I can improve on it from here.&rdquo;</p>
<p>This is a staggering skeleton. It could fall over at any moment, but <strong>eppur si muove</strong>.</p>
<p>Depending on your background, you might view the staggering skeleton as yet another way we allow a mostly broken stack of complex, unreliable software to keep proliferating. Or you might say it makes an easier on-ramp and allows more people to contribute without forcing them through Hindley-Milner hoops.</p>
<p>Either way, you can&rsquo;t deny that staggering skeletons were a big influence on the web. Early websites were a lot of copy-paste-and-modify. That&rsquo;s part of how the web grew so fast.</p>
]]></content></entry><entry><title>Weakness Invites Competition</title><link href="https://michaelnygard.com/blog/2020/09/weakness-invites-competition/"/><id>https://michaelnygard.com/blog/2020/09/weakness-invites-competition/</id><published>2020-09-15T00:00:00+00:00</published><updated>2020-09-15T00:00:00+00:00</updated><content type="html"><![CDATA[<p>Today, nobody wants to start up a competitor to Amazon. New ecommerce retailers aim at niche markets because Amazon is such a juggernaut and fierce competitor that it would be foolhardy to go against them.</p>
<p>Those niche retailers look at Amazon as an <em>exit strategy</em> more than a competitor. Like Microsoft in the 90&rsquo;s, Amazon isn&rsquo;t the competition, they&rsquo;re the <em>environment</em> that any entrant deals with.</p>
<p>Competitors emerge when they sense an opportunity to take away market share from a weakened incumbent. This starts at the periphery: a new entrant takes away a small, uninteresting, or insignificant portion of the market.</p>
<p>The incumbent gradually finds themselves hemmed in by upstarts each nibbling away at their fringes.</p>
<p>The upstarts will both expand the edges into areas the incumbent never realized were part of their TAM. Meanwhile the upstarts make incursions into the core.</p>
<p>Eventually one or two of the upstarts will become the dominant player in the new, expanded market.</p>
<p>This is the classic &ldquo;Innovator&rsquo;s Dilemma.&rdquo; It starts when the dominant player is seen as vulnerable.</p>
]]></content></entry><entry><title>Scaffold or Straightjacket?</title><link href="https://michaelnygard.com/blog/2020/08/scaffold-or-straightjacket/"/><id>https://michaelnygard.com/blog/2020/08/scaffold-or-straightjacket/</id><published>2020-08-27T00:00:00+00:00</published><updated>2020-08-27T00:00:00+00:00</updated><content type="html"><![CDATA[<h1 id="scaffold-or-straightjacket">Scaffold or Straightjacket?</h1>
<p>Douglas Adams&rsquo; classic sci-fi comedy novel &ldquo;The Hitchhiker&rsquo;s Guide to the
Galaxy&rdquo; opens with a bulldozer approach Arthur Dent&rsquo;s house. Since Arthur is
still inside the house, he is naturally concerned.</p>
<p>When Arthur confronts the foreman of the demolition crew, he is informed that his house is to be destroyed to make way for a highway bypass. When discussing the public notice that the local planning office had posted, they have this conversation:</p>
<p>“But the plans were on display…”</p>
<p>“On display? I eventually had to go down to the cellar to find them.”</p>
<p>“That’s the display department.”</p>
<p>“With a flashlight.”</p>
<p>“Ah, well, the lights had probably gone.”</p>
<p>“So had the stairs.”</p>
<p>“But look, you found the notice, didn’t you?”</p>
<p>“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked
filing cabinet stuck in a disused lavatory with a sign on the door saying
&lsquo;Beware of the Leopard&rsquo;.”</p>
<p>For the record, there was no leopard.</p>
<p>This is a funny moment for both the absurdity and the familiarity. Anyone who
has interacted with a bureaucracy can recognize their experience in Arthur&rsquo;s.
Unfortunately, we tend to attach the name <strong>process</strong> to both that kind of
experience and a very different one.</p>
<h2 id="process-as-accidental-constraint">Process as (Accidental) Constraint</h2>
<p>Some processes are deliberately designed to limit or constrain the &ldquo;consumer&rdquo; of
the process. These are the exception though.</p>
<p>Most of the time, a group or department manager will create a process for how
that particular group does its work. Where the trouble arises is that an Arthur
Dent doesn&rsquo;t just interact with that one group. Instead, he has to deal with
several groups that each have their own processes. Each group knows their own
process, but probably has no view into the processes of the other groups. They
can point Arthur from their own department to another (to go get a signed form
of some kind of another.)</p>
<p>Each group acted reasonably, but the experience <strong>from Arthur&rsquo;s point of view</strong>
is absurd.</p>
<p>As a personal anecdote, my wife is a US citizen who was born to two US parents
in a US Army field hospital in Bangkok, Thailand. Consequently she had dual
citizenship until age 18. At that point, the US State Department contacted her
to declare which citizenship she intended to keep. She had to send a form back
to that department, including a consular certificate of natural birth. Where
would that certificate come from? The US State Department. In other words, she
had to send a request to one office in the State Department to get a document to
send back to another office in the State Department. For years we have joked
that those offices are probably across the hall from each other.</p>
<p>We have here what the outside observer experiences as one &ldquo;process.&rdquo; But no-one
can tell them the entire process because there is no global designer. It is a
piecemeal of constantly-changing internal departmental processes. Thus the whole
picture is shrouded (the lights had gone) and there are leopards.</p>
<h2 id="taiichi-ohnos-kind-of-process">Taiichi Ohno&rsquo;s Kind of Process</h2>
<p>Taiichi Ohno created what we now call the Toyota Production System. It has
inspired decades of study in quality and rapid improvement. From TPS we have
gained vocabulary like &ldquo;kanban&rdquo;, &ldquo;andon cord&rdquo;, and &ldquo;kaizen.&rdquo; You could get a
Master&rsquo;s level course in process design by studying just Toyota and Waffle
House. (The American restaurant chain.)</p>
<p>One of the most eye-opening things about TPS is how they approached processes.
Every work station had the work process printed and posted right where the work
is done. Every process was updated almost every day. In fact, Ohno would walk
the factory floor looking for process documents that looked aged: stained paper,
yellowing, tears, etc. He would then ask the worker why they had not learned
anything in such a long time. He would then ask that worker&rsquo;s manager why
<strong>they</strong> had not learned anything. In TPS, kaizen is not a big enterprise
initiative&hellip; it happens a thousand times a day in small groups of workers and
their managers solving problems together.</p>
<p>On a previous project, we had an XP lab full of pairing stations. We wanted all
the pairing stations to be identical: same OS, same configuration, same IDE.
That way any pair could take any station on any day and be productive. To make
this work, we had a wiki page with setup instructions. Every time we needed to
do a fresh setup, we would walk through the instructions. I wrote the initial
instructions, but even so I walked through the instructions each time to make
sure I incorporated improvements that other people had made. If we found errors,
we updated the process. If we found ways to improve efficiency, we updated the
process. In fact, the most common kind of problem came because we didn&rsquo;t do the
process <strong>often enough</strong> so today we would probably reimage one station every
day to make sure we kept pressure on improving that process. The consistency of
the stations paid dividends every day because we never had contention for &ldquo;the QA
machine&rdquo; or &ldquo;the machine with the memory card reader.&rdquo;</p>
<h2 id="scaffolding-versus-straightjackets">Scaffolding versus Straightjackets</h2>
<p>It&rsquo;s an unfortunate collision in the English language that both of these
experiences have the word &ldquo;process.&rdquo;</p>
<p>Taiichi Ohno&rsquo;s kind of process is a scaffolding. It supports the work and lifts
up the worker to perform at a higher level of quality. It captures the best of
what we&rsquo;ve learned about how to do the work so that everyone can benefit.</p>
<p>Because the process is written and posted right with the work, it means that
changing the process document <strong>actually</strong> changes the process. (As opposed to
changing the document then holding training sessions, sending work in progress
back to square one, and having stragglers following the old process for months.)
In other words, they write the process down exactly so it can be changed!</p>
<p>Ohno&rsquo;s processes allow the worker to improve his or her own work. The Arthur
Dent style of process is defined by the worker <strong>for other people to follow</strong>.
The difference is immense.</p>
]]></content></entry><entry><title>Deleting From Databases is Not Cleanup</title><link href="https://michaelnygard.com/blog/2020/08/deleting-from-databases-is-not-cleanup/"/><id>https://michaelnygard.com/blog/2020/08/deleting-from-databases-is-not-cleanup/</id><published>2020-08-05T00:00:00+00:00</published><updated>2020-08-05T00:00:00+00:00</updated><content type="html"><![CDATA[<p>Creating thousands or millions of entities and then deleting them does not
return your database to its initial state.</p>
<p>Queries won&rsquo;t show the deleted entities, but operational results can.</p>
<p>For example, a table in an RDBMS may have extra storage segments allocated to
it. These can generate higher I/O times until someone runs an analyze job to
reset the table stats for the query planner. Some databases treat &ldquo;DELETE FROM
USER&rdquo; very differently from &ldquo;TRUNCATE USER&rdquo;.</p>
<p>Some non-relational DBs use tombstone records to indicate where a deleted entity
had been. That&rsquo;s to facilitate eventual consistency when propagating the
deletion overlaps with propagating other modifications.</p>
]]></content></entry><entry><title>Narrow but Deep?</title><link href="https://michaelnygard.com/blog/2020/07/narrow-but-deep/"/><id>https://michaelnygard.com/blog/2020/07/narrow-but-deep/</id><published>2020-07-27T07:20:35-05:00</published><updated>2020-07-27T07:20:35-05:00</updated><content type="html"><![CDATA[<p>In &ldquo;A Philosophy of Software Design,&rdquo; (ISBN-13: 978-1732102200) John Ousterhout
describes the ideal functional interface as &ldquo;narrow but deep.&rdquo; That is, it
should not expose many methods or functions, but the ones it does expose should
be powerful.</p>
<p>I have mixed reactions to this principle, so I&rsquo;d like to explore some examples
that support it and others that argue against it. Throughout this section, my
lens is malleability.</p>
<p>First, imagine a somewhat typical Java domain object with a &ldquo;broad but shallow&rdquo;
interface. That is, it exposes getters and setters for many attributes. That
gives it a wide surface area. The functionality provided by those methods is
slim. One could argue (and I have) that this is no better than making the
object&rsquo;s attributes public. It adheres to a naming convention that was created
for 90&rsquo;s era GUI builders and the pedantic rule that members ought to be
private.</p>
<p>Thin as that Java object&rsquo;s interface is, it can still inhibit change if any of
the members are references to other objects. A caller must navigate a graph of
references, thereby coupling to what should be the internal structure of the
object and preventing the object from changing those internals. (c.f. <a href="http://wiki.c2.com/?LawOfDemeter">The Law
of Demeter</a>) I will consider this example as
supportive of the &ldquo;Narrow but Deep&rdquo; principle, in that we see a clear failure
mode of the contrapositive.</p>
<p>Second, consider a more intelligent Java object that does not merely expose
attributes but provides behavior beyond &ldquo;addXxxListener&rdquo; or &ldquo;addOrderLine&rdquo;. It
likely has a wider interface, making it &ldquo;broad but deep&rdquo;. Would this object
inhibit change? Possibly. In this case it largely depends on how much of that
surface area any particular caller engages. The broader the object&rsquo;s interface,
the more specialized its use becomes. A very wide interface on an object that is
used just once indicates it is exquisitely adapted to its current usage. One
would not expect to use it in different compositions. On the other hand, an
object with the same wide interface might be used in multiple contexts where the
additional breadth indicates affordances added to facilitate reuse. This style
would be characteristic of Smalltalk or it&rsquo;s cousin Objective-C. &ldquo;Broad but
Deep&rdquo; then sits on the border for me. I think it can work but easily becomes a
barrier to change.</p>
<p>Third, let us consider a case that we might call &ldquo;Narrow but Too Deep.&rdquo; A very
narrow interface would be something like the Interpreter pattern from the Gang
of Four (ISBN-13: 978-0201633610). An interpreter has basically one method
<code>interpret</code> which takes an object that supplies instructions. Perhaps the
argument is an AST or even a string. This is very narrow and very deep. How does
it do on change?</p>
<p>Change in the caller is very well handled. The caller can readily supply a
different set of instructions to achieve new behavior. Change within the
interpreter is a more complex question. Changes to the <em>implementation</em> of the
<code>interpret</code> method are easily done. Callers have no visibility into the
machinery of execution. Changes to the <em>instruction set</em> are more difficult.
Addition to the instructions are easily accomplished. Forward-compatible
modifications to the instructions are feasible (though they may or may not be
easy.) <em>Removal</em> of instructions will be difficult. That is because callers have
absolute freedom to construct their instructions however they like. Thus,
narrowing the instruction set either requires an extended deprecation period for
callers to upgrade, or it requires the ability for the interpreter&rsquo;s authors to
change the call sites.</p>
<p>When we consider those change cases together, we see that a) expansion is easy;
b) modification is possible if it is forward compatible; and c) contraction is a
breaking change. These are the characteristics of an interface! We have created
a <em>new level</em> in which we&rsquo;ve defined a broad (and potentially shallow)
interface: the instruction set. This should not surprise us&ndash;after all the
pattern is called &ldquo;Interpreter&rdquo; so the fact that we&rsquo;ve created a new language is
implicit in the pattern. It is the interface in that new language which becomes
challenging to evolve.</p>
<p>(Best advice about the interface in that language: you are a language designer.
Design carefully. Be conservative about additions, because whatever you add will
be very difficult to retract.)</p>
<p>We can see this same effect with interprocess communication interfaces as well.
HTTP offers a narrow interface: headers, a handful of methods, a URL, and a
payload. (The headers are probably the broadest part, especially when you
consider their mutual interactions. But most non-browser use of HTTP is
restricted to a tiny handful of those headers.) HTTP is too narrow by itself, so
application programmers have variously adopted XMLRPC, SOAP, REST, and GraphQL
to provide a new level of language atop raw HTTP. Let&rsquo;s consider REST for a
moment.</p>
<p>As a new level, we should think of a collection of REST resources as a language.
Indeed, we commonly see resource representations, URL schemas, and response
codes defined with an interface definition language (IDL) called
<a href="https://swagger.io/specification/">OpenAPI</a>. Looking back through previous IPC
mechanisms, we always find some kind of IDL, whether it is called as such or
not. That IDL in fact defines the new language level. The collection of IDLs in
play within the boundaries of set of collaborating applications supplies the
grammar of that specific distributed system. Perhaps this is one reason why it
is so difficult to achieve a coherent distributed system, because the grammar is
amalgamated from many disparate sources that lack an overall design.</p>
<p>Another example of &ldquo;Broad but Too Deep&rdquo; would be SQL. If you take a hard look at
JDBC and strip away everything that is just there to construct other JDBC
objects, you have basically two parts: execution and introspection.
Introspection allows Java code to examine the constructs created and consumed by
the SQL language. Ignore that for a moment and consider execution. Executing a
SQL statement from inside an application closely resembles executing it from a
command line. Submit a string to the database and read the results. Most of SQL
reduces to one method: <code>execute</code>. There are variations that serve only to bridge
between the two language levels: batching, cached statements, query versus
modification. I consider these to be accumulated cruft that are not the essence
of the interface.</p>
<p>Nobody, anywhere would say that using SQL from inside an application makes it
easier to modify the system. Instead, as with our Interpreter pattern, we have
to consider what the interface is in the new language level. That is, we must
design the SQL interface to enhance malleability. One way to do that is to
create views for consumers rather than having them tie directly to tables. This
is the exact analog of programming to segregated interfaces on objects instead
of directly engaging the objects&rsquo; full complement of methods. Elaborate joins in
application code are the SQL equivalent to violating the Law of Demeter.</p>
<p>(Ironically, I usually see this problem solved in the exact opposite way: with a
mapping layer on the caller side that makes it <em>even easier</em> to directly couple
to the precise table definitions.)</p>
<p>We refer to the structure of tables and constraints in the database as a model,
but we could also describe it as a grammar. We can only make statements in the
language of the database that the grammar permits. And again, we find that it is
easy to expand the grammar (schema), may or may not be easy to modify, and very
difficult to contract.</p>
<p>This characteristic of creating a new language level seems to pop up every time
we make the interface at one level sufficiently narrow and deep. Then we need to
worry about coupling and malleability of the new level.</p>
]]></content></entry><entry><title>Consequences are not Pros or Cons</title><link href="https://michaelnygard.com/blog/2020/06/consequences-are-not-pros-or-cons/"/><id>https://michaelnygard.com/blog/2020/06/consequences-are-not-pros-or-cons/</id><published>2020-06-28T00:00:00+00:00</published><updated>2020-06-28T00:00:00+00:00</updated><content type="html"><![CDATA[<p>I&rsquo;ve noticed a pattern in much business writing, including technical writing.
People feel compelled to label every effect as &ldquo;pro&rdquo; or &ldquo;con&rdquo;. I think this
springs from our primary-school training in persuasive writing. As a result,
what should be an engineering analysis often reads like marketing copy.</p>
<p>(A related effect, when writing persuasively, people tend to minimize or
discount the effects they don&rsquo;t like. Richard Feymann <a href="http://calteches.library.caltech.edu/51/2/CargoCult.htm">advised
students</a> to be their
own harshest critics, to find ways to poke holes in their own arguments. It&rsquo;s
the only way to avoid fooling yourself, he said, and you are the easiest person
in the world for you to fool.)</p>
<p>Instead, I suggest we first describe simply consequences, not benefits or
problems. That&rsquo;s because a consequence is just a statement about how the future
will differ from the past. It is <em>objective</em>.</p>
<p>Whether you judge that consequence to be a &ldquo;pro&rdquo; or &ldquo;con&rdquo; depends entirely on
your relationship to the change. If you perceive the change as an improvement to
status quo then you call it a &ldquo;pro&rdquo;. If you don&rsquo;t like the version of the future
which includes that consequence, then you call it a &ldquo;con&rdquo;. That means labelling
a consequence as a benefit is <em>subjective</em>. It describes the relationship of you
and the change.</p>
<p>What about the changes that you don&rsquo;t particularly like or dislike? The ones
that are neither &ldquo;pro&rdquo; nor &ldquo;con&rdquo;? Most of the time those don&rsquo;t get written down
at all!</p>
<p>(As an aside, I also see technical writeups that include a list of &ldquo;pros&rdquo; for
the recommended solution, where each &ldquo;pro&rdquo; precisely lines up with a &ldquo;con&rdquo; of
the current world. This always tells me the author chose the solution first,
then stated the problem in such a way that their chosen solutions appears to be
the best option.)</p>
<p>I recommend that you begin by listing the consequences. Find all the ways that
the future will be unlike the past, if we choose that path. Look for
second-order effects&ndash;the consequences of the consequences.</p>
<p>Look for interactions. How does this approach combine with other systems,
processes, or people?</p>
<p>As you make this list of consequences, try to avoid coloring your thoughts about
the consequences by what your intentions are. Whether you proposed a technical
system, a process change, or a policy difference, once the change is made your
intentions are irrelevant. Only the resulting system state matters. Anything the
system allows will be done regardless of whether that matches your intended
application. Therefore, think beyond the intended outcome or purpose of your
approach. How could this be accidentally misused? Or deliberately abused?</p>
<p>Armed with this list, you will be ready to think about how the consequences
affect you and your organization. This is when you judge whether the effect of
those consequences creates a future that you like better than today.</p>
<p>See also: Rawl&rsquo;s Veil of Ignorance, <a href="https://en.wikipedia.org/wiki/Unintended_consequences">Unintended Consequences</a></p>
]]></content></entry><entry><title>Why did we stop at 2?</title><link href="https://michaelnygard.com/blog/2020/06/why-did-we-stop-at-2/"/><id>https://michaelnygard.com/blog/2020/06/why-did-we-stop-at-2/</id><published>2020-06-24T00:00:00+00:00</published><updated>2020-06-24T00:00:00+00:00</updated><content type="html"><![CDATA[<p>In the dim reaches of Unix history, the first shell was written. It
attached file descriptor 0 as a pipe from the TTY device to a
process. That became &ldquo;stdin&rdquo;. File descriptor 1 is a pipe from the
process out to the TTY. That&rsquo;s &ldquo;stdout&rdquo;. I don&rsquo;t know when FD 2 became
&ldquo;stderr&rdquo; but it was early.</p>
<p>When you write a Unix program, you don&rsquo;t have to open these file
descriptors. They&rsquo;re opened by the parent process, before it uses
&ldquo;exec&rdquo; to load the new program&rsquo;s code. So by the time &ldquo;main()&rdquo; is
called in the child program, FDs 0, 1, and 2 are already connected.</p>
<p>For back end services, we kind of abandoned stdout for a long time, in
favor of logging frameworks that wrote output into files. Then we
added log scrapers and aggregators to gather those logs on a server.</p>
<p>That looked like this:</p>
<ul>
<li>Logging framework (extra dependency in codebase, impediment to
library composition) writes to file</li>
<li>Agent on host tails file sends to collector</li>
<li>Collector daemon on log aggregator writes to FS there</li>
<li>Search engine indexes logs</li>
</ul>
<p>Recently, stdout has had a bit of a renaissance with the advent of
sidecars. Before your application starts (usually in a container now),
the container platform connects a pipe to FD 1. The other end of that
pipe goes to a socket which is connected to a &ldquo;sidecar&rdquo;. The sidecar
reads from the socket and passes the data along to a log collector.</p>
<p>So now instead of this linkage that requires a logging framework
inside your application, you just use builtin functions like
<code>printf()</code> or <code>System.out.println()</code>. You still have to format the log
line, which might want a library function in your app. But now
different libraries that each spit to stdout can compose nicely. We&rsquo;ll
leave it up to the log collector and indexer to ingest different
formats.</p>
<p>Let&rsquo;s pursue this idea further. What else could we simply provide to
an application by hooking up file descriptors before executing it?</p>
<h2 id="messaging-topics">Messaging Topics</h2>
<p>When an application wants to use messaging, it has to include a client
library that knows how to connect to the messaging service. That
requires authentication so the application has to manage credentials
to supply to the client library to connect to the messaging service.</p>
<p>Those credentials are not part of the application code base, they have
to get mixed in by some build or deployment step.</p>
<p>Because the application has to include a client library, the
application becomes specific to a particular messaging product.</p>
<p>What if we said &ldquo;fd 3 and up are for messaging topics?&rdquo; Each FD could
be bound to a topic as either input or output. The application would
just use &ldquo;send&rdquo; and &ldquo;recv&rdquo; socket operations on those FDs. (If we used
&ldquo;read&rdquo; and &ldquo;write&rdquo; file operations on the FD, we&rsquo;d have to figure out
how many bytes to read. What we really want is &ldquo;a message&rdquo; as a unit.)</p>
<p>It would then be the responsibility of the runtime platform to supply
&ldquo;pipes&rdquo; that connect those FDs for the application to the actual
messaging infrastructure. We would certainly implement that connection
via sidecars again.</p>
<p>With this approach, the application no longer needs a client
library. The platform would be responsible to provide <em>some</em> messaging
ability. Applications that need precise control over acknowledgements
might not be able to use this but simple applications that don&rsquo;t worry
about batching or distributed transactions could go a long way with
basic send and receive operations.</p>
<h2 id="databases">Databases</h2>
<p>Similar situation with databases. Why do we need all kinds of specific
wire protocols? How about a file descriptor that is connected directly
to a database. Write to &ldquo;stddb&rdquo; and the DB gets it as a SQL statement
or query. Read from &ldquo;stddb&rdquo; to read results.</p>
<p>Now the application doesn&rsquo;t need driver libraries in it. Nor does it
need to manage credentials. That would be part of the platform
configuration for the application, so we&rsquo;re separating concerns in a
different way.</p>
<h2 id="other-uses">Other Uses</h2>
<p>What else could we simplify if we renew the idea that a program&rsquo;s
environment is set up by the runtime that launches the program?</p>
]]></content></entry><entry><title>Time Emerges From Events</title><link href="https://michaelnygard.com/blog/2020/06/time-emerges-from-events/"/><id>https://michaelnygard.com/blog/2020/06/time-emerges-from-events/</id><published>2020-06-18T19:33:35-05:00</published><updated>2020-06-18T19:33:35-05:00</updated><content type="html"><![CDATA[<p>Without an event, no time passes. This may seem like an odd
assertion. You may say, &ldquo;I can see time passing all around me!&rdquo; But
how do you see it? Do you look at the ticking hands of a clock? In a
mechanical clock, each tick is an event: when the tension on an
escapement exceeds the friction between its prong and the gear, and
the escapement knocks over to the other side with the familiar &ldquo;tick.&rdquo;
That motion transfers to gears which torque the hands a bit further
around.</p>
<p>A digital clock? An oscillating quartz crystal resonates at a
frequency, causing a changing voltage. That voltage feeds a
transistor, and when the voltage is high enough the transistor feeds
current to a counter. Transistors inside the counter flip and flop,
eventually charging some LCD segments and discharging others.</p>
<p>Events everywhere.</p>
<p>Then there are the photons that bounce off the clock into your
eyeballs. They excite your retinal neurons which fire signals to your
brain and trigger a whole new cascade of electrochemical activity.</p>
<p>Without all those events, you can&rsquo;t even perceive the current time.</p>
<p>All clocks require physical interactions, whether mediated by springs
and gears, quartz oscillators, or network packets (which arrive as
self-propagating excitations of the electromagnetic field.)</p>
<p>What about computers? How do they understand time? Let&rsquo;s start with
the easy case of a physical machine like a laptop or desktop machine.</p>
<p>Inside the computer is an oscillator, just like in your digital
clock. It may be a piezo-electric quartz oscillator, or it may be an
&ldquo;LC oscillator&rdquo; (a capacitor and an inductor.) That oscillator emits a
voltage to a clock circuit in your CPU which increments a counter. A
program executing on that CPU can run an instruction like <code>RDTSC</code> to
get that counter value. Your operating system gives the impression of
multiple simultaneous programs by generating an interrupt every so
often, which makes the CPU stop what it&rsquo;s doing and go execute
something else. Physical interactions all over the place! There&rsquo;s the
mechanical vibration of the crystal, or the back-and-forth of electric
to magnetic field in the LC oscillator. In the CPU, the transistors
flip on and off shuttling electrons around.</p>
<p>What about a virtual machine? It doesn&rsquo;t have an oscillator, but the
underlying host machine does. So the VM can send an I/O instruction to
ask the &ldquo;hypervisor&rdquo; what it&rsquo;s clock says. Or, after waking up the
virtual machine, the hypervisor can just sent an I/O packet to the VM
with the current time. More events: all the physical interaction of
the physical host&rsquo;s clock, plus the electron-shuffling of I/O to the
VM.</p>
<p>If you were to somehow stop all those physical interactions, time
would not pass.</p>
]]></content></entry><entry><title>Reading List</title><link href="https://michaelnygard.com/blog/2020/04/reading-list/"/><id>https://michaelnygard.com/blog/2020/04/reading-list/</id><published>2020-04-27T08:36:55-05:00</published><updated>2020-04-27T08:36:55-05:00</updated><content type="html"><![CDATA[
<h1>​Architecture​ &amp; Development<br/></h1>
<h2>Require​d Reading<br/></h2>
<p>
</p>
<ul>
   <li>​​<a href="http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions">Architecture Decision Records</a>​<br/></li>
   <li>
      <a href="https://c4model.com/">C4 Model​</a>&#160;(Note: we will only use the first 3 C&#39;s.)<br/></li>
   <li>
      <a href="https://itrevolution.com/book/accelerate/">Accelerate​</a><br/></li>
   <li>​​​​​Wardley Maps​<br/></li>
   <li>
      Failure Modes and Continuous Resilience<br/></li>
</ul>
<p>
</p>
<h2>Recomm​ended&#160;Reading<br/></h2>
<ul>
   <li>
      <a href="https://www.goodreads.com/book/show/6278270-the-principles-of-product-development-flow">The Principles of Product Development Flow​</a></li>
   <li>
      <a href="https://www.amazon.com/Software-Architecture-Practice-3rd-Engineering/dp/0321815734">Software Architecture in Practice</a><br/></li>
   <li>
      <a href="https://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215/ref=pd_sbs_14_1/131-8203722-8537424?_encoding=UTF8&amp;pd_rd_i=0321125215&amp;pd_rd_r=758241ac-8153-11e9-b9be-2339e3b9437a&amp;pd_rd_w=S5SL3&amp;pd_rd_wg=7Db86&amp;pf_rd_p=588939de-d3f8-42f1-a3d8-d556eae5797d&amp;pf_rd_r=BM0852ZW5EY2V17WMXKX&amp;psc=1&amp;refRID=BM0852ZW5EY2V17WMXKX">Domain-Driven Design</a><br/></li>
   <ul>
   </ul>
   <li>​Data and Reality, 2ed&#160;(Note, the 3rd edition is not as good. Best to stick with 2nd edition.)<br/></li>
   <li>
      <a href="https://itrevolution.com/book/the-phoenix-project/">The Phoenix Project</a>&#160;- A novel about IT transformation with a devops flavor.<br/></li>
   <li>​<a href="https://itrevolution.com/the-unicorn-project/">The Unicorn Project​</a> - A followup to the Phoenix Project that looks more directly&#160;at development.<br/></li>
   <li>
      <a href="https://www.goodreads.com/book/show/8217748-pattern-oriented-software-architecture-volume-1-a-system-of-patterns">Pattern Oriented Software Architecture</a> - Volumes 1, 3, and 4.<br/></li>
   <li>
      <a href="https://pragprog.com/book/mnee2/release-it-second-edition">Release It!</a> - Design and Deploy Production-Ready Software<br/></li>
</ul>
<h2>Suggested Reading</h2>
<ul>
   <li>Event Sourcing at Nordstrom</li>
   <ul>
      <li>
         Part 1<br/></li>
      <li>
         Part 2​<br/></li>
   </ul>
   <li>Enterprise Architecture as Strategy<br/></li>
   <li>
      <a href="https://www.goodreads.com/book/show/604529.Software_by_Numbers">Software by Numbers</a> - Discusses crucial concept of &quot;marketable features&quot; and &quot;architecture elements&quot; to support them. Lays out a method for incrementally delivering architecture as we incrementally deliver features.<br/></li>
   <li>
      <a href="http://markburgess.org/blog_cap.html">CAP and Relativity</a><br/></li>
   <li>​​​<a href="http://curtclifton.net/papers/MoseleyMarks06a.pdf">Out of the Tar Pit</a> - The essential paper on essential versus accidental complexity. We are drowning in accidental complexity.<br/></li>
   <ul>
   </ul>
   <li>​Ultratestable Coding​​​<br/></li>
   <li>
      <a href="https://martinfowler.com/articles/domain-oriented-observability.html">Domain-Oriented Observability</a><br/></li>
   <li>
      <a href="http://www.donellameadows.org/wp-content/userfiles/Leverage_Points.pdf">Leverage Points: Places to Intervene in a System</a> - Donella Meadows&#39;&#160;cornerstone paper.<br/></li>
   <li>
      <a href="https://www.goodreads.com/book/show/34459.Metaphors_We_Live_By">Metaphors We Live By​</a><br/></li>
</ul>]]></content></entry><entry><title>Shared Mutable Team State</title><link href="https://michaelnygard.com/blog/2019/03/shared-mutable-team-state/"/><id>https://michaelnygard.com/blog/2019/03/shared-mutable-team-state/</id><published>2019-03-21T08:36:55-05:00</published><updated>2019-03-21T08:36:55-05:00</updated><content type="html"><![CDATA[<h1 id="shared-state">Shared State</h1>
<p>When programming distributed systems, the hardest kind of data to manage is shared mutable state. It requires some kind of synchronization between writers to avoid missed updates. And, after changes, it requires some kind of mechanism to restore coherence between readers.</p>
<p>I previously wrote about that idea of a <a href="/blog/2018/01/coherence-penalty-for-humans/">coherence penalty</a> as it applies to humans. Following those lines, we might regard the system of development teams in an organization as its own distributed system. Teams pass messages.  Both sides must understand the semantics. Packets get lost. Nodes disappear.</p>
<p>Within that framework, we can consider the same dimensions of state as we would with a distributed computing system:</p>
<ol>
<li>Local, immutable state. Easy.</li>
<li>Local, mutable state. Relatively easy to manage.</li>
<li>Shared, immutable state. Essentially write-once. This is a send-only (unicast or broadcast) item that doesn&rsquo;t require further synchronization. (But see my note later about the time dimension.)</li>
<li>Shared, mutable state. Both synchronization and coherence penalties apply here.</li>
</ol>
<p>So what would constitute shared mutable state between teams?</p>
<h2 id="mutable-state-for-humans">Mutable State for Humans</h2>
<p>Teams and the humans on the teams carry around an understanding of how the system works. That definitely constitutes mutable state.</p>
<p>I think that the <em>metadata</em> used by the software also constitutes shared state. It may be mutable or immutable. More about that shortly.</p>
<p>The software these teams create has shared mutable state of its own. That would be data that the software creates and reads. The data may be at rest in a database or it may be in motion, in the form of messages being passed around.</p>
<p>For the teams that create the software, however, the shared state is the protocol or schema definition. When those change, synchronization and coherence mechanisms are required. To some extent, this is just a consequence of <a href="https://en.wikipedia.org/wiki/Conway's_law">Conway&rsquo;s Law</a>, but it&rsquo;s taken me ten years to understand it.</p>
<h2 id="consequences-of-shared-state">Consequences of Shared State</h2>
<p>For teams to move quickly and independently, we want to minimize the synchronization and coherence delays between teams, in exactly the same way we would do when making the software itself more scalable. So we want to reduce the amount of shared, mutable metadata across team boundaries.</p>
<p>Some corollaries.</p>
<h3 id="less-shared-metadata-means-less-penalties">Less Shared Metadata Means Less Penalties</h3>
<p>Every API has a schema. That means every novel API becomes a new piece of shared state. If you expect to evolve that API, you are planning to mutate the state. Find out if there will be multiple writers!</p>
<p>Where possible, favor a new implementation of an existing API to reduce the amount of state involved. Consider using standard media types and representations, or creating local standards. The time spent creating the standard definitely counts as a synchronization delay, but at least it is explicitly recognized rather than buried in Jira tickets. Also, this time spent creating the standard may cause you to create a better definition that won&rsquo;t need to change as much. Thus you trade a larger early penalty for repeated penalties later.</p>
<p>Integration via database table maximizes the need for concurrent mutation of the schema. This is why we&rsquo;ve come to believe that we should avoid such integration. But again, there may be a place to use it effectively, so long as we recognize the effect on our team-scalability.</p>
<h3 id="immutable-metadata">Immutable Metadata</h3>
<p>Shared, immutable data allows consuming software to scale better by avoiding propagation delays. Shared, immutable data also benefits from caching and can use a publication model.</p>
<p>The same goes for teams. API or schema definitions that never change only require publication. But do they allow for change? Yes, with some constraints.</p>
<p>If every change is strictly additive then we can consider the &ldquo;publication date&rdquo; of an updated protocol definition to be part of the protocol&rsquo;s name. Thus, it isn&rsquo;t a revision of the old protocol, but rather a new protocol entirely that derives from the old one without replacing or invalidating it.</p>
<p>For instance, the existance of HTTP/2 does not mean that HTTP/1.1 no longer exists.</p>
<p>Likewise, you may create a new API definition under a new name. As long as you continue supporting the old definition, then you have not mutated the old shared state, you&rsquo;ve just created a new piece of immutable shared data.</p>
<p>The technology we use doesn&rsquo;t make it easy to maintain multiples of some shared state. For example, RDBMSs have no way to express the idea that the new schema is a copy of the old schema with an extra table. Not only is their data model all about &ldquo;update in place&rdquo; but their metadata is also &ldquo;update in place.&rdquo; Similarly, most of our frameworks for writing APIs are too explicit about routes in URLs. They bake in URL parts like &ldquo;/api/v1&rdquo; in every route so it is hard to say that &ldquo;v3&rdquo; is &ldquo;v2&rdquo; with some changes, and &ldquo;v2&rdquo; is &ldquo;v1&rdquo; with some changes.</p>
<h3 id="consider-structuring-teams-around-shared-state">Consider Structuring Teams Around Shared State</h3>
<p>This is the dual of Conway&rsquo;s Law. One way to decide team boundaries is around interfaces. That is, set up your teams such that there is a team boundary everywhere you want an interface.</p>
<p>That interface definition is shared state which may be mutable. So, consider also drawing team boundaries to maximize ownership over that state. Transform it from shared state to private state and the rate of mutation matters less. Of course, as soon as you draw those new lines you may have created new interfaces, so look carefully for team designs that reduce the <em>global</em> amount of shared mutable state.</p>
<p>If you follow that approach when considering all the different interfaces you must negotiate, then everything gets sucked into a single gigantic team. I think this is why there&rsquo;s a kind of &ldquo;gravitational&rdquo; attraction that tries to pull interacting pieces of software into one mass.</p>
<p>Maybe it&rsquo;s like the life of a star. The life of a star is the unsteady conflict betwee gravity and pressure. Gravity tries to collapse the star, which creates fusion. Fusion makes pressure which holds the star up from collapse.</p>
<p>In software, shared mutable state at interface boundaries plays the role of gravity. Taken to the limit you get monoliths. Communciation overhead and coherence penalties (scaling quadratically with team size) act like pressure. Taken to their limit you get pico-services with solo owners. Rules like the two-pizza team are meant to impose a constraint via force majure.</p>
<h2 id="more-to-explore">More to Explore</h2>
<p>Some of what we know from fallible message-passing networks can extend to the system that creates the software systems. But we must also keep in mind that people have resilience mechanisms that computers lack. &ldquo;Hey, did you get my email?&rdquo; actually works with humans. Humans can also switch from discussing their shared state (say, a protocol definition) to negotiating a new meta-model for that shared state (the meta-meta-model for the data the software will pass.) Software systems cannot renegotiate their protocols dynamically.</p>
<p>There may be more insight available from looking at team and organizational structure as a distributed system.</p>
]]></content></entry><entry><title>My Favorite Bit of Language Design</title><link href="https://michaelnygard.com/blog/2018/12/my-favorite-bit-of-language-design/"/><id>https://michaelnygard.com/blog/2018/12/my-favorite-bit-of-language-design/</id><published>2018-12-26T09:08:55-05:00</published><updated>2018-12-26T09:08:55-05:00</updated><content type="html"><![CDATA[<p>An elegant design conserves mechanisms. It combines a small number of primitives in various ways. When I first learned about this elegant bit of design in <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk-80</a>, I laughed with delight.</p>
<p>In Smalltalk, the primitives are &ldquo;object&rdquo; and &ldquo;message&rdquo;. That&rsquo;s basically it &ndash; except for blocks, which we will see a little later. Behavior arises via objects sending messages to each other. In fact, Smalltalk doesn&rsquo;t even need control structures in the language grammar. Objects and messages suffice. How does a language without control structures do anything useful? How can any conditional logic work?</p>
<p>The key is with the class hierarchy for <code>Boolean</code>, <code>True</code>, and <code>False</code>.</p>
<p>In most languages, &ldquo;boolean&rdquo; is a primitive type that doesn&rsquo;t have any behavior. True and false are values of the type boolean. Not so in Smalltalk. <code>Boolean</code> is an abstract class that has two subclasses: <code>True</code> and <code>False</code>.</p>
<p><code>Boolean</code> defines selectors like <code>ifTrue:</code> and <code>ifTrue:ifFalse:</code> but does not implement them. Each parameter is a block: an object wrapping a chunk of behavior that can be invoked later. (Ruby also calls these blocks, but only allows one at the end of a parameter list.) In Smalltalk, arguments are interleaved with the words in the method selectors. Here&rsquo;s an example from <a href="https://squeak.org">Squeak</a>, a modern Smalltalk, in the <code>Character</code> class:</p>
<pre tabindex="0"><code>isSeparator
   &#34;Answer whether the receiver is one of the separator characters--space, 
   cr, tab, line feed, or form feed.&#34;

   | integerValue |
   (integerValue := self asInteger) &gt; 32 ifTrue: [ ^false ].
   integerValue
   	caseOf: {
   		[ 32 &#34;space&#34; ] -&gt; [ ^true ].
   		[ 9 &#34;tab&#34; ] -&gt; [ ^true ].
   		[ 13 &#34;cr&#34;] -&gt; [ ^true ].
   		[ 10 &#34;line feed&#34; ] -&gt; [ ^true ].
   		[ 12 &#34;form feed&#34;] -&gt; [ ^true ] }
   	otherwise: [ ^false  ]
</code></pre><p>The first line just names the method. The quoted string is documentation visible in the class browser. <code>| integerValue |</code> says this method uses one local variable. Then we get to the interesting line. <code>(integerValue := self asInteger)</code> sends the <code>asInteger</code> message to <code>self</code> and assigns the result to <code>integerValue</code>. The assignment returns the value which was assigned, an integer object. Next, the <code>&gt;</code> message is sent to the integer object, with <code>32</code> as a parameter. Yes, comparison &ldquo;operators&rdquo; are also just messages sent to objects. Every number is an object. The result of <code>&gt;</code> is an instance of <code>Boolean</code>. So the paradoxical-seeming <code>ifTrue: [ ^false ]</code> will be sent to whichever <code>Boolean</code> was returned from <code>&gt;</code>. The caret just means &ldquo;return&rdquo; and <code>false</code> is a literal that names the singleton instance of the class <code>False</code>.</p>
<p>That&rsquo;s a lot of messages in one short line of code. Thanks to the hard work of many brilliant programmers and quite a few transistor-doublings since 1980, it performs well today. There are also many tricks with pointer tagging and flyweight objects that make it reasonable to have numbers represented with objects.</p>
<p>Now we get to the punchline and the genius of Smalltalk&rsquo;s little trio for Boolean logic. So <code>True</code> implements <code>ifTrue:</code> something like this:</p>
<pre tabindex="0"><code>ifTrue: trueBlock [
  &#34;We are true -- evaluate trueBlock&#34;

  &lt;category: &#39;basic&#39;&gt;
  ^trueBlock value
]
</code></pre><p>(This sample from <a href="https://github.com/gnu-smalltalk/smalltalk/blob/master/kernel/True.st#L60">GNU Smalltalk</a>. )</p>
<p><code>True</code> knows it is true, so it unconditionally evaluates the block. It won&rsquo;t surprise you to see that <code>False</code> implements <code>ifTrue:</code> like this:</p>
<pre tabindex="0"><code>ifTrue: trueBlock [
  &#34;We are false -- answer nil&#34;

  &lt;category: &#39;basic&#39;&gt;
  ^nil
]
</code></pre><p>All the other variants such as <code>ifTrue:ifFalse</code> and <code>ifFalse:</code> are implemented similarly. In fact <code>and:</code> and <code>or:</code> operate the same way.</p>
<p>The beautiful part about this is how a small number of features, used consistency and pervasively, combine to allow simplicity to emerge.</p>
<p>Control structures can be discarded in favor of objects sending messages and evaluating blocks. Polymorphism subsumes conditionality, but it only works if objects are pervasive. If Smalltalk had the same split between boxed numerics versus primitive numbers that Java uses, this wouldn&rsquo;t work. Numbers must be objects. True and false must be objects, not primitive values or puns for distinguished values of <code>uint8_t</code>.</p>
<p>Since I learned about Smalltalk&rsquo;s elegant trio, I&rsquo;ve tried to apply the same principle in my own designs. Maybe we can push an idea farther. Make it more pervasive. Get more mileage out of it. Represent some other behavior (like conditionality) with a more simpler but more general idea (like polymorphism.) Ask the question, &ldquo;What if we made <em>everything</em> an X?&rdquo; for some value of X.</p>
]]></content></entry><entry><title>Networking Topics</title><link href="https://michaelnygard.com/blog/2018/09/networking-topics/"/><id>https://michaelnygard.com/blog/2018/09/networking-topics/</id><published>2018-09-30T12:59:52-05:00</published><updated>2018-09-30T12:59:52-05:00</updated><content type="html"><![CDATA[<p>Another quick post based on a Twitter exchange. (Maybe this will help
save content from the ephemera of Tweets.)</p>
<p>A short, incomplete list topics in networking that programmers should know about:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol">ICMP messages</a></li>
<li><a href="http://tcpipguide.com/free/t_IPMessageFragmentationProcess.htm">Frame size and fragmentation</a></li>
<li><a href="http://man7.org/linux/man-pages/man7/socket.7.html">Socket options</a></li>
<li><a href="http://man7.org/linux/man-pages/man2/listen.2.html">Listen queue and behavior when full.</a></li>
<li><a href="https://github.com/leandromoreira/linux-network-performance-parameters">All the timeouts and why they exist.</a></li>
<li>When read, write, and connect calls block and why.</li>
<li>When memory buffers are copied and how to avoid.</li>
</ul>
<p>A reference I love is the encyclopedic <a href="https://amzn.to/2IsfDMl">The TCP/IP
Guide</a> (note: affiliate link.) It&rsquo;s got
detail on pretty much everything, including bogons and the secret
masters of the Internet&mdash;the DNS root servers and BGP administration.</p>
]]></content></entry><entry><title>Joyful Isolation</title><link href="https://michaelnygard.com/blog/2018/09/joyful-isolation/"/><id>https://michaelnygard.com/blog/2018/09/joyful-isolation/</id><published>2018-09-27T09:08:55-05:00</published><updated>2018-09-27T09:08:55-05:00</updated><content type="html"><![CDATA[<p>Way back in January, Sam Newman <a href="https://twitter.com/samnewman/status/952610105169793025">tweeted this</a> (perhaps rhetorical) question:</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I was in the middle of creating this slide (wrt patch hygiene) and had to stop half-way through and ask myself - aren’t we all just making this worse? <a href="https://t.co/fCTAYDc3Pn">pic.twitter.com/fCTAYDc3Pn</a></p>&mdash; Sam Newman (@samnewman) <a href="https://twitter.com/samnewman/status/952610105169793025?ref_src=twsrc%5Etfw">January 14, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


<p>It got a handful of retweets recently, and I responded with:</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve said it before, but each of these layers is another attempt to achieve isolation between apps. It could (should?) be fixed with a new OS at the bottom, ditching every layer above that.</p>&mdash; Michael Nygard (@mtnygard) <a href="https://twitter.com/mtnygard/status/1044209034461663232?ref_src=twsrc%5Etfw">September 24, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


<p>Which definitely needs some expansion as <a href="https://twitter.com/olabini/status/1044805819265634304">Ola Bini</a> pointed out. So here
goes. (Caution: long ramble ahead. Second caution: I&rsquo;m going to gloss
over a lot of details in an effort to convey a bigger picture.)</p>
<p>The textbook definition of an operating system is that it provides
process isolation, memory management, and hardware abstraction. Some
useful operating systems have been built that remove various parts of
this definition, but for now I&rsquo;ll use that.</p>
<p>Let&rsquo;s look at what various degrees of process isolation could mean and
why we&rsquo;ve got that stack of stuff in Sam&rsquo;s slide.</p>
<p>The most basic degree of isolation would be that one process cannot
read or modify another process&rsquo;s memory. &ldquo;Modern&rdquo; operating systems
like Linux, Windows, and macOS do pretty well on that. (They&rsquo;re modern
in the sense of &ldquo;widely used today&rdquo; but all are based on 30+ year old
foundations.)</p>
<p>Memory isolation offers some degree of security. Security will be a
recurring theme.</p>
<p>Other conflicts between processes might arise from error rather than
malice. Overusing the CPU, for example. Or consuming all available
memory. This would allow one process to unfairly deny service to other
processes, so an operating system must also enforce usage
limits.</p>
<p>When today&rsquo;s operating systems were invented (and here I&rsquo;m describing
the Linux kernel as an instance of the Unix family), the idea of
multitenant workload was strictly a mainframe concern. Mini- and
micro-computers barely existed. The largest networks consisted of a
few dozen intermittently connected machines. Most of the users knew
each other by first name. Active, anonymous threats were unknown.</p>
<p>Benign noninterference between processes sufficed.</p>
<p>Process isolation now needs to mean much more than just memory
protection and quota enforcement. In fact, the definition of &ldquo;process&rdquo;
breaks down a bit, too.</p>
<p>In an operating system, a &ldquo;process&rdquo; consists of allocated memory (some
of which may be paged out to storage), a memory mapping, and control
information: threads&rsquo; stacks, open files, network sockets,
entitlements or permissions, interrupt vectors, and so on. The
operating system prevents one process from <em>interfering</em> with another,
but it doesn&rsquo;t prevent it from <em>detecting the presence</em> of others.</p>
<p>That is exactly what&rsquo;s needed for multitenant cloud
workload. A process from user A should have no way to detect the
presence or absence of a process from user B. They might come from
competing organizations. For government workload, they might operate
under different security classification schemes.</p>
<p>As we look at the stack of virtualization and containerization in
Sam&rsquo;s slide, we can see how each layer attempts to plug some detection
holes in lower layers.</p>
<p>The hypervisor is an operating system. It runs other operating systems
because the guest operating systems are bad at preventing detection.</p>
<p>For example, each process should have it&rsquo;s own IP address so it cannot
detect other processes by their use of TCP ports it would like to
occupy.</p>
<p>Each process should appear to have full control over the
filesystem. Otherwise, processes could detect each other via changes
to files. (Implemented by the VM, and again by the container.) That
means the application&rsquo;s own configuration files should be
isolation. But it also means the operating system configurations
should be isolated.</p>
<p>Each process should have it&rsquo;s own namespace for users. Otherwise they
could detect each other via the user listing. (Implemented by the VM
and again by containers.)</p>
<p>An aside about containers: a &ldquo;container&rdquo; process with it&rsquo;s own view of
a filesystem plus an isolated &ldquo;namespace&rdquo; for kernel objects. That
means a process running in a container is really executing on the same
underlying kernel as the host operating system. It&rsquo;s just not allowed
to see other processes. Add a virtual NIC and IP address to the
container and it has the kind of isolation I&rsquo;m talking about.</p>
<p>When we look at this stack of layers in terms of detection-prevention,
the crucial need for strong patch hygiene becomes clear. Any hole in
an underlying layer allows detections that should not be
allowed. Since no layer really provides perfect isolation, we must
treat a patch at any layer with the same priority as a ring 0 bug in
the lowest level.</p>
<p>(I also wonder if mainframes still have something to offer here. I
just don&rsquo;t know enough about their operating systems to say one way or
the other. But think about this: IBM had virtual machines in the 1960&rsquo;s.)</p>
<p>What could we do to create an operating system that meets our needs
today?</p>
<p>Elevate non-detectability to the primary design goal. There should be
no call or action an isolated workload can perform that would reveal
the presence or absence of other workload on the same system. That
includes <em>other instances</em> of the same workload!</p>
<p>A program can&rsquo;t know what physical host it runs on. In a really
extreme interpretation, programs can&rsquo;t even be allowed to sample the
clock too quickly, or else they could use timing attacks to detect
other workloads!</p>
<p>Such non-detectability is not possible with Unix-style
kernels. Likewise for Windows kernels. A microkernel like Mach might
be able to achieve it, but Darwin as built would not. All of these
embed the multi-user, multi-process, shared-filesystem model too
deeply. Thus, the stack of virtualization and containerization.</p>
<p>There are some capability-based operating systems that offer
promise. <a href="https://sel4.systems/">seL4</a> comes to mind.</p>
<p>I find unikernels interesting as a way of packaging
applications. An operating system that aims toward true
non-dectability might well use such a &ldquo;super-fat binary&rdquo; as a
unikernel. It would carry the program text along with the expected
filesystem. (A program binary today is mostly an image of the bytes
that will go into memory for execution&mdash;called the text. There is some additional
information about variable initialization and relinking symbols based
on their actual load address.)</p>
<p>Functions as a service certainly step toward greater isolation. Each
function execution might as well happen in a new operating system, as
far as the function itself can tell.</p>
<p>It&rsquo;s likely that this kind of operating system would have a very
different notion of the &ldquo;unit of workload&rdquo; than a process. A process
with threads is a compromise notion anyway. It allows the threads to
share each other&rsquo;s memory but assigns permissions, resources, and
quotas for the collection of threads.</p>
<p>In a container, we get these levels of grouping:</p>
<ol>
<li>The container has a process space (meaning PIDs), IP address, sockets, file
descriptors, file system, and user base. It has an overall quota on
CPU, memory, and network usage.</li>
<li>A process in the container has permissions of one user, resources,
fine-grained quotas. It cannot see the memory of other processes in
the container.</li>
<li>Threads in a process share memory, but do not have their own
permissions or quotas.</li>
</ol>
<p>If we extend that to cover the VM, hypervisor, and host operating
system, we get 6 levels of grouping but each level has a totally
different model.</p>
<p>I don&rsquo;t know what the design would look like if we aimed for a
homogenous structure that allowed grouping or isolation at each
level. It would probably look more like an Erlang supervision tree or
seL4 style capability delegation. It would look very different from
the Unix-derived systems we have now.</p>
]]></content></entry><entry><title>Evolving Away From Entities</title><link href="https://michaelnygard.com/blog/2018/04/evolving-away-from-entities/"/><id>https://michaelnygard.com/blog/2018/04/evolving-away-from-entities/</id><published>2018-04-28T16:48:10-05:00</published><updated>2018-04-28T16:48:10-05:00</updated><content type="html"><![CDATA[<p>
Hat tip to <a href="http://stuarthalloway.com/">Stuart Halloway</a>&#x2026; once again a 10 minute conversation with
Stu grew into a combination of code and writing that helped me clarify
my thoughts.
</p>

<p>
I've been working on new content for my Monolith to Microservices
workshop. As the name implies, we start with a monolith. Everyone gets
their own fully operational production environment, running a fork of
the code for <a href="http://lobste.rs">Lobsters</a>. It's a link sharing site with a small but
active group of users. I chose that application for a few reasons:
</p>

<ol class="org-ol">
<li>It's very well written and has good internal structure. That makes
it malleable enough to use in a classroom setting.</li>
<li>It's small enough to be useful during a week-long workshop.</li>
<li>The domain is familiar enough that students don't need a ton of
domain-specific introduction.</li>
</ol>

<p>
One of the features of Lobsters is "hats." From the site's own
description:
</p>

<blockquote>
<p>
Hats are a more formal process of allowing users to post comments
while "wearing such and such hat" to give their words more authority
(such as an employee speaking for the company, or an open source
developer speaking for the project).
</p>
</blockquote>

<p>
Hats are the first feature that we factor out into its own
microservice. I thought it might be interesting to walk through that
process and how the new service is defined.
</p>

<p>
This is going to be a long post, because I'm trying to recapitulate my
whole thought process. Please let me know if I've skipped steps.
</p>

<div id="outline-container-org4aaa984" class="outline-3">
<h3 id="org4aaa984">Point of Departure</h3>
<div class="outline-text-3" id="text-org4aaa984">
<p>
The feature basically works like this:
</p>

<ul class="org-ul">
<li>A user can have zero or more "hats." Each hat has a short name that
designates a project or product. Examples include "bsd" or "docker".</li>
<li>When posting a comment or sending a private message, the user can
choose a hat to "wear". This could be to demonstrate
credibility or make an official statement.</li>
<li>It's the job of site admins to verify the user's identity and
standing relative to the hat.</li>
</ul>

<p>
Given that description, it's pretty natural to think about an API like
<a href="https://hats.docs.apiary.io/#">this one</a>. (Follow the link to read an API doc in "Blueprint" format.)
</p>

<p>
As you read the API description, notice that most of the routes read
like "create this thing", "delete this thing", and so on. It sounds
suspiciously close to CRUD, and that should trigger an uneasy memory
about <a href="https://www.michaelnygard.com/blog/2017/12/the-entity-service-antipattern/">entity services</a>.
</p>

<p>
Entity services are what you get when you only think about the data
and not how you are going to use it.
</p>

<p>
A more subtle problem with this API is that it provides the wrong
point of entry for the most common use case. When a user starts
posting a comment, the site needs to find out what hats that user
wears. It seldom needs to find all the wearers for a hat.
</p>

<p>
We can plaster over this gap by adding more routes to the API. But
there's probably a better way to approach the whole issue.
</p>
</div>
</div>

<div id="outline-container-org077f84a" class="outline-3">
<h3 id="org077f84a">Think About Behavior</h3>
<div class="outline-text-3" id="text-org077f84a">
<p>
If I have a mantra for architecture and design, it's "Think about the
behavior." I advise people to evaluate their designs by walking
through use cases. What components have the ability to make progress
toward the goal? How does the flow of control get there? What
information do I need to supply to it?
</p>

<p>
If we think about behavior in the "hats" feature, we'll see that the
original API has some big gaps.
</p>

<ul class="org-ul">
<li>How does an admin know that someone needs a hat?</li>
<li>When does the system need to read all the users for a hat?</li>
<li>Can someone who has a hat bestow it on someone else?</li>
<li>What happens if someone has a hat when they make a comment but later
loses that hat?</li>
</ul>

<p>
Let's take a behavior-oriented view on the hats. In particular, let's
think about the lifecycle:
</p>

<ol class="org-ol">
<li>A user <span class="underline">requests</span> an admin to bestow a hat upon them.</li>
<li>The admin can <span class="underline">grant</span> that request. In that case, the hat is
attached to the user.</li>
<li>The admin can <span class="underline">reject</span> that request.</li>
</ol>

<p>
If we stick with the CRUD style API, that behavior is "pushed" out to
someplace else. Requests, approvals, rejections all have to go in the
caller. What's left in the service isn't enough to be
interesting. We'd have a caller that still knows all the details of
the data. Any other callers of the Hats service would also be
completely coupled to the details of the data. It might as well just
be an RDBMS table.
</p>

<p>
Why would we just take a single table from an RDBMS and put
it on the far end of an HTTP interface?
</p>
</div>
</div>

<div id="outline-container-org6d7a0fe" class="outline-3">
<h3 id="org6d7a0fe">Take Two</h3>
<div class="outline-text-3" id="text-org6d7a0fe">
<p>
Let's try making an API that maps the actions in the original
controller. After all, we're decomposing a monolith into
microservices. The monolith already works so we know the current
design solves for the features needed.
</p>

<p>
The <a href="https://github.com/mtnygard/lobsters/blob/d6303284a3fb52961198e0056f8fd23368c3e164/app/controllers/hats_controller.rb">controller</a> has these methods:
</p>

<ul class="org-ul">
<li><code>index</code></li>
<li><code>build_request</code></li>
<li><code>create_request</code></li>
<li><code>requests_index</code></li>
<li><code>approve_request</code></li>
<li><code>reject_request</code></li>
</ul>

<p>
Notice something interesting about those methods? None of them talk
about hats! The only one that actually cares about hats is <code>index</code>,
and all it does it get all the hats to serve <a href="https://lobste.rs/hats">the hats page</a> that shows
everything. That's the least interesting of the behaviors. The hats
controller appears to be almost entirely about the <span class="underline">process</span> of
requesting a hat and approving or rejecting such a request. Let's
set a bookmark called "Requests" here and come back to it later.
</p>

<p>
Where do hats actually get used? Let's look at comments. When a user
starts to post a comment or reply to another comment, there's
<a href="https://github.com/mtnygard/lobsters/blob/master/app/views/comments/_commentbox.html.erb#L47">this bit of code</a> that checks to see if the user has any hats. If so,
the comment fragment offers the user the option to "put on a
hat". That gets carried through to the comments controller, where
<a href="https://github.com/mtnygard/lobsters/blob/master/app/controllers/comments_controller.rb#L22">the hat is attached to the comment</a>. (Now we know what happens if the
user doffs a hat&#x2026; old comments still show that they did wear the hat
at the time of the comment. Nice!)
</p>

<p>
The comments controller doesn't have any methods about the hats,
although it does use hat data. It gets the hat data from the User
model which <a href="https://github.com/mtnygard/lobsters/blob/master/app/models/user.rb#L33"><code>has_many :hats</code></a>.
</p>

<p>
Now we understand the feature much better. Instead of talking about it
in the abstract, we can talk about when each part of the feature is
activated and used. We understand the lifecycle of the data: who
creates it? When? In response to what signals?
</p>

<p>
All of these are essential to designing successful microservices! If
you try to design services in a vaccuum, you'll find they don't work
together and you need a bunch of glue code in the service
consumers. (Hint: if you start trying to solve distributed two-phase
commit across microservices, then you haven't gotten concrete enough
about the actual use cases.)
</p>
</div>
</div>

<div id="outline-container-orgebd3a2e" class="outline-3">
<h3 id="orgebd3a2e">Not One But Two</h3>
<div class="outline-text-3" id="text-orgebd3a2e">
<p>
Recall that the <code>HatsController</code> class seemed to be about requesting a
hat more than the actual hat? Suppose we created microservice methods
like:
</p>

<ul class="org-ul">
<li>Request hat</li>
<li>Approve hat request</li>
<li>Reject hat request</li>
<li>List pending hat requests</li>
<li>See my hat request</li>
</ul>

<p>
That would pretty much map one-to-one with the existing
<code>HatsController</code> methods. Seems like an easy way to solve the
problem. The only problem is that it seems a touch too specific. We
can often make microservices simpler, more general, and more useful by
abstracting the interface up one step. Let's try that out:
</p>

<ul class="org-ul">
<li>Request "thing"</li>
<li>Approve request for "thing"</li>
<li>Reject request for "thing"</li>
<li>List pending requests</li>
<li>See my request</li>
</ul>

<p>
"Request thing" and "Approve or reject request" seem to be pretty
general ideas. I bet you can think of half a dozen other uses for that
concept in your company. What is the "thing" though? We need some kind
of concrete representation, right? Let's try to avoid premature
commitment to that. I want to see how long we can just use a URL to
identify the thing.
</p>

<p>
With this in mind, take a look at the <a href="https://request5.docs.apiary.io/#">new request service API</a>. Like
the <code>HatsController</code>, this doesn't say anything about hats. In fact,
it seems to rely on external information for almost everything. It
uses URLs to identify the person (or system) making the request, the
thing being requested, and the person (or system) that approves or
rejects the request.
</p>

<p>
This may seem like premature generalization and you may cry "YAGNI" at
me. I understand. But there's something about YAGNI you have to keep
in mind&#x2026; it applies when you can keep the cost of change low and
refactor across interfaces. Microservices do well at keeping the cost
of change low, but are much more difficult when refactoring. The whole
idea is that a service interface is isomorphic to a boundary in your
organization. So we don't have collective code ownership, we don't
have refactoring across the interface, and I contend that YAGNI must
be greatly weakened as a rule.
</p>
</div>
</div>

<div id="outline-container-org7749872" class="outline-3">
<h3 id="org7749872">What Was That About Hats Again?</h3>
<div class="outline-text-3" id="text-org7749872">
<p>
Requests are sorted, more or less. Now we need to turn our attention
to the question of the actual hats. We saw earlier that hats appear in
three places:
</p>

<ol class="org-ol">
<li>When building a page with comments on it, the (initially hidden)
comment form needs to know what hats a user has.</li>
<li>When posting a comment, copy the ID of whatever hat the user was
wearing into the comment itself.</li>
<li>When rendering a comment, display the text of whatever hat is
attached to the comment.</li>
</ol>
</div>

<div id="outline-container-orgff01f79" class="outline-4">
<h4 id="orgff01f79">Hats for a User</h4>
<div class="outline-text-4" id="text-orgff01f79">
<p>
We can mostly handle the first case by querying for requests by
subject (the subject being the user.) This could be done when the user
logs in or the first time the user goes to a comment page.
</p>

<p>
However, the current method for querying by subject will return too
much. First of all, it will return requests that are pending or were
rejected. We can easily handle that using a matrix-query style of URL
with both <code>subject</code> and <code>status</code> as parameters. Second, if we really
do use the Request service for more than the hats feature, we don't
want other "kinds" of requests appearing in the comments page. This
one is trickier, since it needs a kind of meta-data that doesn't exist
on the current definition of the Request service.
</p>

<p>
I'm not going to add that metadata just yet. That's because my
workshop simulates the process of progressively splitting services out
from a monolith. It's a common case to discover that your existing
functionality is a subset of some more general, more valuable use
case. That's when you go back and apply some data migrations and
define a new API that deals with the general case. You then make the
original API "magically" add the new metadata.
</p>

<p>
This may result in API names like "foo2" and "baz3." That's OK. The
refactored, evolved version of your system won't look like a
greenfield design would. Your system will show its history. Don't
think of that as ugly scars. Think of it like laugh lines.
</p>
</div>
</div>

<div id="outline-container-org0797559" class="outline-4">
<h4 id="org0797559">Adding a hat to a comment</h4>
<div class="outline-text-4" id="text-org0797559">
<p>
When we find out what hats a user has, we get a list of URLs. Adding a
hat to a comment doesn't need any additional interaction with requests
or hats. Just copy the URL onto the comment where the code <a href="https://github.com/mtnygard/lobsters/blob/master/app/controllers/comments_controller.rb#L23">used to copy the ID</a>.
</p>
</div>
</div>

<div id="outline-container-org7decead" class="outline-4">
<h4 id="org7decead">Displaying a hat</h4>
<div class="outline-text-4" id="text-org7decead">
<p>
One last interesting bit. We need to exchange a hat URL (from the
'object' of the original request) for a text label to display. This is
the first thing we've encountered that is truly unique to hats&#x2026; and
it's basically a reference table.
</p>

<p>
This post is getting quite long as it is and I need to save
<span class="underline">something</span> for people who come to my class. So I'll leave you with
these quick thoughts:
</p>

<ul class="org-ul">
<li>Reference tables are a common need. Maybe we can create a more
general service for curating reference tables. That would include
information like who is allowed to add entries.</li>
<li>Someone may request a hat that doesn't exist yet. If the request is
approved, then the hat "poofs" into existence. So what is the
difference between "proposed" reference data and "current" reference
data?</li>
<li>Is that lifecycle both general and interesting? Maybe there
are two different APIs for dealing with curating the data versus
just looking at current.</li>
<li>On the consuming side, we might decide to simply cache all the
existing entries for a reference table. It's reasonable to have a
query method on the table that says "give me the complete list of
Hats" (or countries, or currencies, time zones, etc.) Fetching those
at startup time is reasonable, but on a cache miss we still need to
go ask about a single entry.</li>
<li>If we do create a reference data service, which deployment model do
we want to use: A single instantiation of the service with all of
our reference tables? Or one instantiation for each reference table?
Think about the tradeoffs here both in terms of infrastructure cost
and operations cost. (More instances = more infrastructure. Fewer
instances = less ops cost at low volume, but more operations as
scaling becomes harder.)</li>
</ul>
</div>
</div>
</div>

<div id="outline-container-org31afc1c" class="outline-2">
<h2 id="org31afc1c">Conclusions</h2>
<div class="outline-text-2" id="text-org31afc1c">
<p>
Our first idea is usually not the best one. To understand the
boundaries and interface that make sense for a service, we have to
think about it <i>in situ</i>. We aren't trying to model the world. We are
trying to build systems that deliver features. Those features are
specific and we must design our APIs to deliver those specific
features. At the same time, however, we can often deliver the features
just as easily by abstracting the API up one level. This makes a
service more general and more reusable. It delivers more marginal
value (i.e., it makes future work cheaper) and may even be simpler to
write because it has less special-case logic or constraints.
</p>

<p>
We need to be careful to not push work into the gaps between
services. One way to avoid that is to design APIs in terms of the
caller's needs rather than the provider's view of the world.
</p>

<p>
Finally, sometimes the original service we set out to build evaporates
completely when you discover that an apparently unitary concept is
actually a composition of different concepts hiding under a noun.
</p>
</div>
</div>
]]></content></entry><entry><title>Data is the New Oil</title><link href="https://michaelnygard.com/blog/2018/03/data-is-the-new-oil/"/><id>https://michaelnygard.com/blog/2018/03/data-is-the-new-oil/</id><published>2018-03-02T16:11:35-06:00</published><updated>2018-03-02T16:11:35-06:00</updated><content type="html"><![CDATA[
<p>
The other day I tweeted that "Data is the New Oil." A lot of people
retweeted, but a quite a few asked what I meant by that. I'll amplify
a bit to explain the analogy.
</p>

<p>
This ended up being a lot to unpack from a quick tweet! For quite a
few years now, I've used Twitter as a way to scratch the itch of
personal expression. A quick sound bite there, highly compressed and
idiosyncratic was just enough to relieve the mental pressure. As a
consequence, I stopped blogging nearly as much. Lately, though, I feel
the need for nuance and explanation, so I hope to do more in this
space.
</p>

<p>
&#x2014;
</p>

<p>
First, oil was the key resource that drove the industrial revolution
in the 20th century. That was the age of oil and steel, according to
economist and historian Carlota Perez. In <a href="http://amzn.to/2FMTojb">Technological Revolutions and Financial Capital</a>, Prof. Perez shows that every technology
revolution goes through predictable phases, from irruption to
exhaustion. Economics in the 20th C were totally defined by access to
and movement of oil. Those who had it either had leverage or became
victims, depending on their ability to create military and economic
alliances. Oil reserves could put a nation on the world stage. A
nation that bargained well with its oil would have power far beyond
what its population size or technological ability would usually merit.
</p>

<p>
In fact, a large part of the U.S. economic dominance in the latter
portion of the 20th C can be explained by the petrodollar. Since the
Bretton Woods conference after WWII, oil transactions around the world
were denominated in USD. If Saudi Arabia sold oil to China, then China
had to pay SA in dollars. That meant China needed plenty of USD
currency reserves and SA needed the US to hold riyal. (The biggest
economic story in the world right now is <i>not</i> the DJIA hitting 26,000
or falling by 0.5% in a day&#x2026; it's that China, Russia, Saudi
Arabia, and Iran are now trading oil denominated in rubles, yuan, and SDRs.)
</p>

<p>
But before the internal combustion engine, <i>oil wasn't a resource</i> it
was a nuisance. The oil-rich land in Oklahoma is where the
U.S. Government settled people it wanted to get out of the way. Oil
gets in the way of farming. It was development of the new technology
that turned oil from a hassle into a resource.
</p>

<p>
Once oil became a resource, a feedback loop got underway. More demand
for oil led to more extraction, which caused industries to find new
uses for the stuff. Plastics, fertilizers, etc. Increased demand drove
increased supply and more efficient extraction, which in turn led to
more demand.
</p>

<p>
Prof. Perez already identified the next technological revolution as
information technology. However, I think her book got the timing
wrong. It was published in 2002 and dated the start of the revolution
to the advent of the personal computer in 1970. With the advantage of
16 years of additional observation, I think that there were two
missing pieces: networking and machine learning. The real irruption of
information technology started over the last decade. And as with the
previous revolution, this one creates a need for a new resource:
data.
</p>

<p>
Before this, data was a nuisance. It filled up disks and needed to be
purged. It was often dirty (meaning not fully correct or conforming to
syntactic rules) and incomplete. But toward the end of the 00's, <a href="http://www.michaelnygard.com/blog/2008/07/mounds-of-filthy-data/">some people</a> started to see it as a resource. You might spend a lot of time
cleansing and canonicalizing small data sets. But with a lot of data,
it's impossible. At the same time though, you don't need to clean the
data to glean information. Some kinds of errors average out and
interesting signals emerge.
</p>

<p>
(If only I had come up with the name "Big Data" instead of "Dirty
Data!")
</p>

<p>
Of course, we're well beyond mere Big Data now. With every eye turning
toward machine learning, we've got a new challenge for our
data. That's training.
</p>

<p>
A machine learning model is only as good as the training data. The
training data itself needs to be classified. In other words, to train
a machine to detect cars, you need a lot of photos where some are
tagged "this has a car" and others <i>don't</i> have that tag. Yes,
some CAPTCHAs just might be using you to train a machine, instead of
proving you aren't one.
</p>

<p>
(Aside: we're going to see a lot of conflict about biases in ML
models. We will expect the machines to be free of human cognitive and
social biases, but we're training them with data created and
classified by humans! We will actually be asking the machines to make
errors in a systematic way to offset humans' systematic errors in the
training inputs. It's not hard to see why HAL 9000 went spare.)
</p>

<p>
Data is digital, but it's not easy to move around in these
quantities. We're not talking about a <a href="https://en.wikiquote.org/wiki/Andrew_S._Tanenbaum">station wagon full of tapes barreling down the highway</a>&#x2026; we're talking
about a <a href="https://www.youtube.com/watch?v=Sd5ZLJWQmss">convoy of 18-wheelers</a> loaded with racks full of disks.
</p>

<p>
Companies that have tagged or classified data sets are the new oil
producing and exporting countries. If you have large quantities of
classified photos, video, voice, text, etc. you are well-positioned to
train ML models. If you don't have such a dataset, then you need to
create a consumer-oriented startup to get humans to do the initial
classification for you or you need to license access to data from one
of the big players. (There are some open-access datasets that
hobbyists can use, but those will never be as large or as current as
the proprietary data sets.) Alternatively, focus on providing the
engineering support and tooling for the technostates that have the
data, the same way that Norway provides engineering to Saudi Arabia.
</p>

<p>
Just as oil production led to new uses of oil that reshaped everything
from consumer products to food production to hygiene, I fully expect
data-fueled ML models to reshape this century. Moreover, we will see
demand for ever-greater data production from our homes, workplaces,
and devices. This will cause tension and conflict about data use just
as happened with land-use, water-use, and mineral rights. That will
lead to new legal regimes and doctrines. In extreme cases, it may lead
to revolutions similar to the <a href="https://en.wikipedia.org/wiki/Revolutions_of_1848">Revolutions of 1848</a> in Europe.
</p>
]]></content></entry><entry><title>Coherence Penalty for Humans</title><link href="https://michaelnygard.com/blog/2018/01/coherence-penalty-for-humans/"/><id>https://michaelnygard.com/blog/2018/01/coherence-penalty-for-humans/</id><published>2018-01-09T10:22:23-08:00</published><updated>2018-01-09T10:22:23-08:00</updated><content type="html"><![CDATA[
<p>
This is a brief aside from my ongoing series about avoiding entity
services. An interesting dinner conversation led to thoughts that I
needed to write down.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Amdahl's Law</h2>
<div class="outline-text-2" id="text-1">
<p>
In 1967, Gene Amdahl presented a case against multiprocessing
computers. He argued that the maximum speed increase for a task would
be limited because only a portion of the task could be split up and
parallelized. This portion, the "parallel fraction," might differ from
one kind of job to another, but it would always be present. This
argument came to be known as <a href="https://en.wikipedia.org/wiki/Amdahl's_law">Amdahl's Law</a>.
</p>

<p>
When you graph the "speedup" for a job relative to the number of
parallel processors devoted to it, you see this:
</p>


<div class="figure">
<p><img src="https://upload.wikimedia.org/wikipedia/commons/e/ea/AmdahlsLaw.svg" alt="AmdahlsLaw.svg" />
</p>
</div>

<p>
The graph is asymptotic in the serial fraction, so there is an upper
limit to the speedup.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">From Amdahl to USL</h2>
<div class="outline-text-2" id="text-2">
<p>
The thing about Amdahl's Law is that, when Gene made his argument,
people weren't actually building very many multiprocessing
computers. His formula was based on first principles: if the serial
fraction of a job is exactly zero, then it's not <span class="underline">a</span> job but several.
</p>

<p>
Neil Gunther extended Amdahl's Law based on observations of
performance measurements from many machines. He arrived at the
<a href="http://www.perfdynamics.com/Manifesto/USLscalability.html">Universal Scalability Law</a>. It uses two parameters to represent
contention (which is similar to the serial fraction) and
<span class="underline">incoherence</span>. Incoherence refers to the time spent restoring a common
view of the world across different processors.
</p>

<p>
In a single CPU, incoherence penalties arise from caching. When one
core changes a cache line, it tells other cores to eject that line
from their caches. If they need to touch the same line, they spend
time reloading it from main memory. (This is a slightly simplified
description&#x2026; but the more precise form still has incoherence
penalty.)
</p>

<p>
Across database nodes, incoherence penalties arise from consistency
and agreement algorithms. The penalty can be paid when data is changed
(as in the case of transactional databases) or when the data is read
in the case of eventually consistent stores.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Effect of USL</h2>
<div class="outline-text-2" id="text-3">
<p>
When you graph the USL as a function of number of processors, you get
the green line on this graph:
</p>


<div class="figure">
<p><img src="/images/blog/2018-01-09-coherence-penalty-for-humans/gman-scale4.png" alt="gman-scale4.png" />
<i>(Image from <a href="http://www.perfdynamics.com/">perfdynamics.com</a>)</i>
</p>
</div>

<p>
(The purple line shows what Amdahl's Law would predict.)
</p>

<p>
Notice that the green line reaches a peak and then declines. It means
that there is a number of nodes that produces maximum throughput. Add
more processors and throughput goes down. Overscaling hurts
throughput. I've seen this in real-life load testing.
</p>

<p>
We'd often like to increase the number of processors and get more
throughput. There are exactly two ways to do that:
</p>

<ol class="org-ol">
<li>Reduce the serial fraction
</li>
<li>Reduce the incoherence penalty
</li>
</ol>
</div>
</div>

<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">USL in Teams?</h2>
<div class="outline-text-2" id="text-4">
<p>
Let's try an analogy. If the "job" is a project rather than a
computational task, then we can look at the number of people on the
project as the "processors" doing the work.
</p>

<p>
In that case, the serial fraction would be whatever portion of the
work can only be done one step after another. That may be fodder for a
future post, but it's not what I'm interested in today.
</p>

<p>
There seems to be a direct analog for the incoherence penalty. Whatever
time the team members spend re-establishing a common view of the
universe is the incoherence penalty.
</p>

<p>
For a half-dozen people in a single room, that penalty might be really
small. Just a whiteboard session once a week or so.
</p>

<p>
For a large team across multiple time zones, it could be large and
formal. Documents and walkthrough. Presentations to the team, and so
on.
</p>

<p>
In some architectures coherence matters less. Imagine a team with
members across three continents, but each one works on a single
service that consumes data in a well-specified format and produces
data in a well-specified format. They don't require coherence about
changes in the processes, but would need coherence for any changes in
the formats.
</p>

<p>
Sometimes tools and languages can change the incoherence penalty. One of
the arguments for static typing is that it helps communicate across
the team. In essence, types in code are the mechanism for broadcasting
changes in the model of the world. In a dynamically typed language,
we'd either need secondary artifacts (unit tests or chat messages) or
we'd need to create boundaries where subteams only rarely needed to
re-cohere with other subteams.
</p>

<p>
All of these are techniques aimed at the incoherence penalty. Let's
recall that overscaling causes reduced throughput. So if you have a
high coherence penalty and too many people, then the team as a whole
moves slower. I've certainly experienced teams where it felt like we
could cut half the people and move twice as fast. USL and the
incoherence penalty now helps me understand why that was true&#x2014;it's
not just about getting rid of deadwood. It's about reducing the
overhead of sharing mental models.
</p>

<p>
In <a href="http://www.michaelnygard.com/blog/2015/07/the-fear-cycle/">The Fear Cycle</a> I alluded to codebases where people knew large scale
changes were needed, but were afraid of inadvertant harm. This would
imply a team that was overscaled and <span class="underline">never</span> achieved coherence. Once
lost, it seems to be really hard to re-establish. That means ignoring
the incoherence penalty is not an option.
</p>
</div>
</div>

<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">USL and Microservices</h2>
<div class="outline-text-2" id="text-5">
<p>
By the way, I think that the USL explains some of the interest in
microservices. By splitting a large system into smaller and smaller
pieces, deployed independently, you reduce the serial fraction of a
release. In a large system with many contributors, the serial fraction
comes from integration, testing, and deployment activities. Part of
the premise for microservices is that they don't need the integration
work, integration testing, or delay for synchronized deployment.
</p>

<p>
But, the incoherence penalty means that you might not get the desired
speedup. I'm probably stretching the analogy a bit here, but I think
you could regard interface changes between microservices as requiring
re-coherence across teams. Too much of that and you won't get the
desired benefit of microservices.
</p>
</div>
</div>

<div id="outline-container-sec-6" class="outline-2">
<h2 id="sec-6">What to do about it?</h2>
<div class="outline-text-2" id="text-6">
<p>
My suggestion: take a look at your architecture, language, tools, and
team. See where you spend time re-establishing coherence when people
make changes to the system's model of the world.
</p>

<p>
Look for <b>splits</b>. Split the system with internal boundaries. Split the
team.
</p>

<p>
Use your environment to communicate the changes so re-cohering
can be a broadcast effort rather than one-to-one conversations.
</p>

<p>
Look at your team communications. How much of your time and process is
devoted to coherence? Maybe you can make small changes to reduce the
need for it.
</p>
</div>
</div>
]]></content></entry><entry><title>Services By Lifecycle</title><link href="https://michaelnygard.com/blog/2018/01/services-by-lifecycle/"/><id>https://michaelnygard.com/blog/2018/01/services-by-lifecycle/</id><published>2018-01-05T14:00:44-06:00</published><updated>2018-01-05T14:00:44-06:00</updated><content type="html"><![CDATA[<p>This post took a lot longer to pull together than I expected. Not
because it was hard to write, but because it was too easy to write
<em>too much</em>. Like a
<a href="https://www.youtube.com/watch?v=96aW1V8DhEU">pre-bonsai tree</a>, it
would grow out of control and get pruned back over and over.</p>
<p>In the meantime, I delivered
a workshop
and spent some lovely holiday time with my family.</p>
<p>But it&rsquo;s a new year now, and January is devoid of holidays so it&rsquo;s
high time I got back to business.</p>
<h1 id="avoiding-the-entity-service">Avoiding the Entity Service</h1>
<p>In my
<a href="http://www.michaelnygard.com/blog/2017/12/the-entity-service-antipattern">last post</a>,
I made a case against entity services. To recap, an entity service is
a set of CRUD operations on a business entity such as <em>Person</em>,
<em>Location</em>, <em>Contract</em>, <em>Order</em>, etc. It&rsquo;s an antipattern because it
creates high semantic and operational coupling. Edge services suffer
from common-mode failures through their shared dependency on the
entity services. Changes or outages in the entity services have large
&ldquo;failure domains.&rdquo;</p>
<p>A lot of good advice springs from Eric Evan&rsquo;s hugely influential book
&ldquo;Domain-Driven Design.&rdquo; It was written before the service era, but
seems to apply well now. I&rsquo;m not an expert on DDD, though, so I&rsquo;m
going to offer some techniques that may or may not be described
there. (I dig the &ldquo;bounded context&rdquo; idea, but need to re-read the
whole book before I comment on it more.)</p>
<p>There are several ways to avoid entity services. This post explores
just one (though it&rsquo;s one I particularly like.) Future posts will look
at additional techniques.</p>
<h2 id="focus-on-behavior-instead-of-data">Focus on Behavior Instead of Data</h2>
<p>When you think about what a service <em>knows</em>, you always end up back at
CRUD. I recommend thinking in terms of the service&rsquo;s
responsibilities. (And don&rsquo;t say it&rsquo;s responsible for <em>knowing some
data</em>!) Does it apply policy? Does it aggregate a stream of concepts
into a summary? Does it facilitate some kinds of changes? Does it
calculate something? And so on.</p>
<p>Once you know what a service does, you can figure out what it needs to
know. For instance, a service that restricts content delivery based on
local laws needs to know a few things:</p>
<ol>
<li>What jurisdiction applies?</li>
<li>What classifiers are on the content?</li>
<li>What classifiers are not allowed in that jurisdiction?</li>
</ol>
<p>Notice that #1 and #2 are specific to an individual request, while #3
is slowly-changing data. Thus it makes sense for the service to <em>know</em>
#3 and be <em>told</em> #1 and #2.</p>
<p>This leads us to a deeper discussion about what the service knows. How
does that data get into the service? Is there a GUI for a legal team?
Maybe there&rsquo;s a feed we can purchase from a data vendor. Maybe we need
to apply machine learning based on lawsuits and penalties incurred
when we violate local laws. (I&rsquo;m kidding! Don&rsquo;t do the last one!) The
answers to these questions will firm up your design and situate your
service in its ecosystem.</p>
<h2 id="model-like-its-1999">Model Like It&rsquo;s 1999</h2>
<p>When modeling, I like to use a technique from object-oriented
design. <a href="http://wiki.c2.com/?CrcCard">CRC cards</a> let me lay out
physical tokens that represent services. Then I can play a game where
I simulate a request coming in at the edge. A service can add
information to a request and pass it along to another service,
following the &ldquo;Tell, Don&rsquo;t Ask&rdquo; principle.</p>
<p>If you are in a team, you can deal out cards to players then simulate
a request by passing a physical object around. That will quickly
reveal gaps in a design. Some common gaps that I see:</p>
<ol>
<li>A service doesn&rsquo;t know where to send the request. It lacks
knowledge of other services that can continue processing. The
solution is either to statically introduce it to the next party or
to provide URLs in the data that lead to the handler.</li>
<li>A service receives a request that is insufficient. The incoming
request either lacks information or has an implicit context that
should be turned into data on the request.</li>
</ol>
<p>While playing the CRC game, it&rsquo;s OK to assume your service already has
data it naturally depends on. That is, slowly-changing data the
service uses should be considered an asynchronous process relative to
the current request. But do make note of that slowly-changing data so
you remember to build in the flows needed to populate it.</p>
<p>If you follow &ldquo;Tell, Don&rsquo;t Ask&rdquo; strictly, then the activation set will
be a strict tree. Anywhere a service calls out to more than one
downstream, it should be sending instructions forward in parallel
rather than serially making queries followed by an instruction.</p>
<h2 id="dealing-with-consistency">Dealing with Consistency</h2>
<p>If it were just a matter of passing requests along with extra data, then life would be simple. As often happens, trouble comes from side effects.</p>
<p>Services are not pure functions. If a service call <em>always</em> results in the same result for the same parameters, then you don&rsquo;t need a service. Make a library and avoid the operational overhead! Services only make sense when they change something in the world. That means state and state changes are unavoidable concerns in a service-based architecture.</p>
<p>Consistency immediately comes up as an issue. Many words have already been written about <a href="https://en.m.wikipedia.org/wiki/CAP_theorem">CAP</a>. Some good, some misguided, and some pure marketing. I even wrote an earlier post about the subtle differences between the <a href="http://thinkrelevance.com/blog/2013/12/23/beware-inconsistent-definitions-of-consistency">C in CAP versus the C in ACID</a>.</p>
<p>Let&rsquo;s look at one way to deal with consistency in the face of changing state.</p>
<h2 id="divide-services-by-lifecycle-in-a-business-process">Divide Services by Lifecycle in a Business Process</h2>
<p>Many business processes have entities that go through a series of milestones. In a particular state, changes are allowed to certain attributes but not others. Once a subset of the properties are &ldquo;valid&rdquo; (whatever <em>that</em> means) the entity can transition to the next stage of the business process.</p>
<p>Instead of viewing this as a single entity with a bunch of booleans, or a CURRENT_STATE attribute (which implies a state machine that is unknown to consumers) we can view each state as a <em>different thing</em>.</p>
<p>For example, consider this process from a peer-to-peer lending situation:</p>
<ol>
<li>A loan requestor starts by creating a project proposal. The requestor can provide descriptive text, an amount to request, some media assets (projects with big vivid pictures get funded faster.)</li>
<li>Once the loan request is completed, the requestor submits it for approval. At this point, the requestor is no longer allowed to change the amount requested.</li>
<li>An analyst from the host company reviews the proposal. In parallel, a background job checks the requestor&rsquo;s credit score, repayment history, and criminal record.</li>
<li>The analyst reviews the request and either assigns a target interest rate, rejects the request outright, or indicates that more information is needed from the requestor.</li>
<li>Once approved, the proposal is visible to funders. Funders can commit to a certain amount (contingent on <em>their</em> credit scores.) At this stage, none of the proposal information can be changed, although the whole proposal could be withdrawn.</li>
<li>Once fully funded, funders must transfer money within 3 days. No additional funders are allowed to join the project at this time, but they can go on a waiting list in case some of the committed funders fail to supply the money.</li>
<li>Once funds are in the funders&rsquo; accounts, it all gets transfered into a short-term holding account. The project information, all the individuals&rsquo; information (tax IDs, etc.) goes to an underwriter who produces a legal loan document for all parties to sign.</li>
</ol>
<p>For the moment, we&rsquo;re leaving out some of the tributaries of this flow.</p>
<p>Notice how moving through the business process causes previous information to become effectively read-only?</p>
<p>The original form of this system was a monolith that had a state field
on a &ldquo;Loan&rdquo; model object. That was a <em>wide</em> object, since
it had everything from the initial proposal through to the ultimate
payment. If we made that into a &ldquo;Loan&rdquo; microservice we would exactly
end up with an entity service, CRUD operations, and high coupling into
that service, as shown below.</p>
<p><img src="/images/blog/2018-01-05-services-by-lifecycle/with-loan-entity-service.svg" alt="With loan entity service"></p>
<p>Try playing CRC with this design. You&rsquo;ll find that all requests reach
the Loan service.</p>
<p>What is less evident from the diagram is about the cost of embedding a
state model into the entity service directly. If we put a state field
on the Loan, then every Loan must go through the same state
machine. It locks us into a single kind of business process. At the
time, we already knew the company was exploring direct-funded loans
through a banking partner. So there would be a minimum of two flavors
of process. (Or one process with proliferating branches.)</p>
<p>I briefly considered using <a href="https://github.com/mtnygard/devs">my DEVS library</a> to represent the state plus state machine as EDN data on each Loan, but ultimately decided against it.</p>
<p>Instead, I thought we could make each state into its own service, as shown here.</p>
<p><img src="/images/blog/2018-01-05-services-by-lifecycle/with-lifecycle.svg" alt="With Lifecycle"></p>
<p>Now as the business process moves along, we&rsquo;re really sending
documents to each service. For example, from Proposal to Project, we
send a &ldquo;ProjectStarter&rdquo; document that contains all the attributes
needed for a Project. When the analyst approves a project, the analyst
GUI (or backend for same) creates a &ldquo;LoanStarter&rdquo; and sends it to the
UnfundedLoan service. Likewise, once all funding is received, the
&ldquo;Collection&rdquo; service creates a &ldquo;LoanPackage&rdquo; document and sends it to
the &ldquo;Underwriting&rdquo; service. (That&rsquo;s &ldquo;collection&rdquo; as in &ldquo;gatherer of
documents&rdquo; not &ldquo;collection&rdquo; as in &ldquo;break your kneecaps.&rdquo;) Further
downstream, we set up a schedule of payments to receive from the
requestor and payments to issue to the backers. We also keep a set of
ledgers to track balances per account.</p>
<p>Each of the services has facilities to add or update information
relevant to that service. It ignores anything in the incoming
documents that it doesn&rsquo;t need.</p>
<p>This gives us a lot of flexibility in how we build the overall process.</p>
<p>Consider our direct-funding scenario. We need a new &ldquo;DirectFunding&rdquo;
service that finds suitable candidates. It sends a document out to the
bank and receives a response. On a favorable response, DirectFunding
can create its own version of the LoanPackage document for
underwriting. In other words, treating these stages as services
connected by well-defined document formats allows us to introduce more
pathways without creating the state machine from hell.</p>
<p>As an additional benefit, we can easily monitor the flow of documents
to see when the process is healthy. We can monitor each service&rsquo;s
activity to create a cumulative flow diagram. We get a lot of
visibility. And since some stages are triggered by humans (e.g., the
analysts) we can even figure out how our staff model must scale with
business throughput.</p>
<p>It should also be clear that this style works well with event transfer
instead of document transfer. It would be natural to put all the
documents onto a message bus.</p>
<p>Overall, I think this style offers a nice degree of alignment between
the technology and the business. The only &ldquo;downside&rdquo; I can see is that
it requires a service developer to understand how that service
contributes to value streams and business processes.</p>
<h2 id="backtracking-errors-and-races">Backtracking, Errors, and Races</h2>
<p>There is still a minute window of opportunity for <em>perceived</em>
inconsistency to sneak in. For example, what happens if the requestor
tries to change the proposal while the analyst is reviewing it? Or
worse, what if they change it in those milliseconds between when the
analyst clicks &ldquo;Approve&rdquo; in the GUI and when the document goes over to
the Project service? For that matter, how do we tell the Proposal
service that the proposal can no longer be edited without withdrawing
the request and resubmitting as a new Project?</p>
<p>This post is already getting too long, so I&rsquo;m going to answer those
questions next time. It shouldn&rsquo;t take another month since we&rsquo;re past
the holiday-fun-times and into the serious winter months.</p>
<hr/>
<p>If you&rsquo;re interested in learning more about breaking up monoliths, you
might like my
Monolith to Microservices
workshop.</p>
<p>I&rsquo;m hosting a session
this March
in sunny Florida. Especially to all my dear friends and colleagues
back home in Minnesota&hellip; we know that March is a great time to <em>not</em>
be in Minnesota.</p>
<p>Or, <a href="https://n6consulting.com/contact/">contact me</a> to schedule a
workshop at your company.</p>
]]></content></entry><entry><title>The Entity Service Antipattern</title><link href="https://michaelnygard.com/blog/2017/12/the-entity-service-antipattern/"/><id>https://michaelnygard.com/blog/2017/12/the-entity-service-antipattern/</id><published>2017-12-05T12:53:44-06:00</published><updated>2017-12-05T12:53:44-06:00</updated><content type="html"><![CDATA[
<p>
In my <a href="http://www.michaelnygard.com/blog/2017/11/keep-em-separated/">last post</a> I talked about the need to keep things separated once
they've been decoupled. Let's look at one of the ways this breaks
down: entity services.
</p>

<p>
If a pattern is a solution to a problem in a context, what is an
antipattern?  An antipattern is a commonly-rediscovered solution to a
problem in a context, that inadvertently creates a resulting context
we like less than the original context. In other words, it's a pattern
that makes things worse (according to some value system.)
</p>

<p>
I contend that "entity services" are an antipattern.
</p>

<p>
To make that case, I need to establish that "entity services" are a
commonly-rediscovered solution to a problem and that the resulting
context is worse than the starting context (a monolith.)
</p>

<p>
Let's start with the "commonly-rediscovered" part. Entity services are
in Microsoft's <a href="https://docs.microsoft.com/en-us/dotnet/standard/microservices-architecture/multi-container-microservice-net-applications/data-driven-crud-microservice">.NET microservices architecture ebook</a>. Spring has a
<a href="https://spring.io/blog/2015/07/14/microservices-with-spring">tutorial with them</a>. (Spring may give us the absolute easiest way to
create an entity service. The same class can be annotated with JSON
mapping and persistence mapping.) RedHat has a <a href="https://www.redhat.com/cms/managed-files/mi-microservices-eap-7-reference-architecture-201606-en.pdf">Microservice Reference
Architecture</a> with <span class="underline">product-service</span> and <span class="underline">sales-service</span>. Some of the
microservice-focused frameworks such as <a href="http://www.jhipster.tech/">JHipster</a> start with CRUD on
data entities.
</p>

<p>
In order to make the case that the resulting context is worse than the
starting context, I need to assume what that starting context actually
is. For the sake of generality, I'll assume a largish, legacy
application that is more-or-less a monolith. It may call out to some
integration points to get work done, but features are pretty much
local and in-process. There are multiple instance of the process
running on different hosts. Basically, like the following diagram.
</p>

<img src="/images/blog/2017-12-05-the-entity-service-antipattern/monolith.svg">
<div class="caption"></div>


<p>
All features reside in the code for the application instances.
</p>

<p>
Many other authors have enumerated the sins of the monolith, so I
won't belabor them here. (Though I feel compelled to make a brief
aside to say that we did somehow deliver quite a lot of working,
valuable features that ran in monoliths.)
</p>

<p>
How might we describe this initial context?
</p>

<ul class="org-ul">
<li>It is clear where the code to build a feature goes and how to test
the code.
</li>
<li>The release cadence is dictated by the slowest-delivering subteam.
</li>
<li>There is little inherent enforcement of boundaries, thus coupling
tends to increase over time.
</li>
<li>Performance problems can be found by profiling a single application.
</li>
<li>The cause of availability problems is typically found in one place.
</li>
<li>Building features that rely on multiple entities is straightforward,
though it may come at the cost of inappropriate coupling.
</li>
<li>As the code grows large, the organization is at risk of entering
<a href="http://www.michaelnygard.com/blog/2015/07/the-fear-cycle/">the fear cycle</a>.
</li>
<li>Feature availability may be compromised by inappropriate coupling
via common modes in the application. (E.g., thread pools, connection
pools.)
</li>
<li>Feature availability should be improved by redundancy of the whole
application itself. This is reduced, however, if the application is
vulnerable to the surrounding environment, as in the case of a
Self-Denial Attack, memory leak, or race condition.
</li>
</ul>

<p>
Supposing we move this to a microservice architecture, with entity
services. We might end up with something like this example from the
Spring tutorial:
</p>

<img src="/images/blog/2017-12-05-the-entity-service-antipattern/with-entity-services.svg">
<div class="caption"></div>


<p>
In this version, you should assume that each of the service boxes
comprises multiple instances of that service.
</p>

<p>
Obviously there are more moving parts involved. That immediately means
it's harder to maintain availability.
</p>

<p>
The challenges of performance analysis and debugging are well
documented, so I won't belabor them.
</p>

<p>
But in this resulting context, where do features get created? A few of
them are direct interactions of the "Online Shopping" service and the
individual entity services. For example, creating an account is
definitely just between Online Shopping and Accounts.
</p>

<p>
Most features, however, require more than one of the entities. They
will use aggregates or intersections of entities.
</p>

<p>
For example, suppose we need to calculate the total price of a cartful
of items. That involves the cart, the products (for their individual
prices) and the account to find the applicable sales tax or VAT. I
predict that it will be implemented in the Online Shopping service by
making a bunch of calls to the entity services to get their data.
</p>

<p>
We can depict this with an "activation set" (a term I made up) to
show which services are activated during processing of a single
request type. For this picture, we focus on just the services and
elide the infrastructure.
</p>

<img src="/images/blog/2017-12-05-the-entity-service-antipattern/activation-set-es.svg">
<div class="caption"></div>


<p>
So to price the cart, we have to activate four of the five services in
our architecture.
</p>

<p>
That activation represents operational coupling, which affects
availability, performance, and capacity.
</p>

<p>
It also represents semantic coupling. A change to any of the entity
services has the potential to ripple through into the online shopping
service. (In particularly bad cases, the online shopping service may
find itself brokering between data formats: translating version 5 of
the user data produced by Accounts into the version 3 format that Cart
expects.)
</p>

<p>
A common corollary to entity services is the idea of "stateless
business process services." I think this meme dates back to last
century with the original introduction of Java EE, with entity beans
and session beans. It came back with SOA and again with microservices.
</p>

<p>
What happens to our picture if we introduce a process service to
handle pricing the cart?
</p>

<img src="/images/blog/2017-12-05-the-entity-service-antipattern/activation-set-es2.svg">
<div class="caption"></div>


<p>
Not much improvement.
</p>

<p>
Bear in mind this is the activation set for just one request type. We
have to consider all the different request types and overlay their
activation sets. We'll find the entity services are activated for the
majority of requests. That makes them a problem for availability and
performance. It also means they won't be allowed to change as fast as
we'd like. (Services with a high fan-in need to be more stable.)
</p>

<p>
So, let's look at the resulting context of moving to microservices
with entity services:
</p>

<ul class="org-ul">
<li>Performance analysis and debugging is more difficult. Tracing tools
such as Zipkin are necessary.
</li>
<li>Additional overhead of marshalling and parsing requests and replies
consumes some of our precious latency budget.
</li>
<li>Individual units of code are smaller.
</li>
<li>Each team can deploy on its own cadence.
</li>
<li>Semantic coupling requires cross-team negotiation.
</li>
<li>Features mainly accrue in "nexuses" such as API, aggregator, or UI
servers.
</li>
<li>Entity services are invoked on nearly every request, so they will
become heavily loaded.
</li>
<li>Overall availability is coupled to many different services, even
though we expect individual services to be deployed frequently. (A
deployment look exactly like an outage to callers!)
</li>
</ul>


<p>
In summary, I'd say both criteria are met to label entity services as
an antipattern.
</p>

<p>
Stay tuned. In a future post, we'll look at what to do instead of
entity services.
</p>


<hr/>

<p>
If you're interested in learning more about breaking up monoliths, you might like my Monolith to Microservices workshop.
</p>
<p>
There is a session open to the public in March 2018.
</p>
<p>
Or, <a href="https://n6consulting.com/contact/">contact me</a> to schedule a workshop at your company.
</p>
]]></content></entry><entry><title>Keep 'Em Separated</title><link href="https://michaelnygard.com/blog/2017/11/keep-em-separated/"/><id>https://michaelnygard.com/blog/2017/11/keep-em-separated/</id><published>2017-11-27T15:45:31-06:00</published><updated>2017-11-27T15:45:31-06:00</updated><content type="html"><![CDATA[
<img src="/images/blog/2017-11-27-keep-em-separated/boundary-waters.jpg">
<div class="caption"></div>


<p>
Software doesn't have any natural boundaries. There are no rivers,
mountains, or deserts to separate different pieces of software. If two
services interact, then they have a sort of "attractive force" that
makes them grow towards each other. The interface between them becomes
more specific. Semantic coupling sneaks in. At a certain point, they
might as well be one module running in one process.
</p>

<p>
If you're building microservices, you need to make sure they don't
grow together into an impenetrable bramble. The whole microservice bet
is that we can trade deployment and operational complexity for
team-scale autonomy and development speed. The last thing you want is
to take on the operational complexity of microservices and <span class="underline">still</span>
move slowly due to semantic coupling among them.
</p>

<p>
Maybe you've recently broken up a monolith into microservices, but
found that things aren't as easy and rosy as the conference talks led
you to believe.
</p>

<p>
Maybe you have a microservice architecture that is starting to slow
down and get harder. Like cooling honey, it seems sweet at first but
gets stickier later.
</p>

<p>
I'm going to write a short series of posts with techniques to keep 'em
separated. This will go into API design for microservices, information
architecture, and feature design. It'll be all about making smaller,
more general pieces, that you can rearrange in interesting ways.
</p>

<hr/>

<p>
If you're interested in learning more about breaking up monoliths, you might like my Monolith to Microservices workshop.
</p>
<p>
There is a session open to the public in March 2018.
</p>
<p>
Or, <a href="https://n6consulting.com/contact/">contact me</a> to schedule a workshop at your company.
</p>
]]></content></entry><entry><title>Root Cause Analysis as Storytelling</title><link href="https://michaelnygard.com/blog/2017/11/root-cause-analysis-as-storytelling/"/><id>https://michaelnygard.com/blog/2017/11/root-cause-analysis-as-storytelling/</id><published>2017-11-08T17:01:06-06:00</published><updated>2017-11-08T17:01:06-06:00</updated><content type="html"><![CDATA[
<p>
Humans are great storytellers and even better story-listeners. We love
to hear stories so much that when there aren't any available, we make
them up on our own.
</p>

<p>
From an early age, children grasp the idea of narrative. Even if they
don't understand the forms of storytelling so much, you can hear a
four-year-old weave a linked list of events from her day.
</p>

<p>
We look for stories behind everything. At a deep level, we want the
world's events to <span class="underline">mean</span> something. Effect follows cause, and causes
have an actor to set them in motion.
</p>

<p>
Our sense of balance also demands that large effects should have large
causes, with correspondingly large intent.
</p>

<p>
A drunk driver speeds through a red light, oblivious. A crossing car stops
short. The shaken driver creeps home with a pounding pulse, full of
queasy adrenaline. She unbuckles her daughter and hugs her tightly.
</p>

<p>
A drunk driver speeds through a red light, oblivious. A crossing car
is in the intersection. The drunk smashes into it, right at the
drivers' side door. The woman's bloody face is hidden behind
airbags. Her daughter sits in her new wheelchair for her mother's
funeral.
</p>

<p>
The difference between those stories is a matter of a split second in
timing. There is absolutely no change in the motives or desires of
anyone in the two vignettes.  The first drunk, if caught, would get a
jail term and large fine. He would probably lose his driver's
license.
</p>

<p>
But most people would judge the motives of the second driver far more
harshly. They would condemn him to a lengthy prison term and a
lifetime ban on driving.
</p>

<p>
When we see a large effect, we expect a large cause, with a large
intent.
</p>

<p>
The idea that some vast, horrible events strike randomly fills us with
dread. People can't bear the thought that a single unbalanced
<span class="underline">nobody</span> can change the course of a nation's history with one rifle
shot, so they spend more than 50 years searching for "the truth."
</p>

<p>
"Root Cause Analysis" expresses a desire for narrative. With the power
of hindsight, we want to find out what went wrong, who did it, and how
we can make sure it never happens again. But because we have the
posterior event, we judge the prior probabilities differently. Any
anomaly or blip suddenly becomes suspect.
</p>

<p>
People don't look as hard at anomalies when <span class="underline">nothing bad</span>
happens.
</p>

<p>
They don't notice all the times the same weird log message pops up
before &#x2026; everything continues as normal.
</p>

<p>
When we look for "root cause," what we are really trying to discern is
not "what made this happen." We are looking for something that would
have stopped it from happening. We are building a counterfactual
narrative&#x2014;an alternate history&#x2014;where that drunk driver dropped his
keys in the parking lot and was thereby delayed a few crucial
seconds.
</p>

<p>
Peel back the surface on a root cause analysis and you almost always
see a formula that goes like this: "factor X" could have prevented
this. "Factor X" was not present, therefore the bad event happened.
</p>

<p>
The catch is that there is usually an endless variety of possible
counterfactuals. Often, more than one counterfactual narrative would
have prevented the bad outcome equally well. Which one was the root
cause? Non-existence of "factor X" or non-existence of "factor Y?"
</p>

<p>
Next time you have a bad incident, why not try to focus your efforts
in a different way? Work on learning from the times that things
<span class="underline">don't</span> go wrong. And be explicit about looking for <span class="underline">many</span> possible
interventions that would have prevented the problem. Then select ones
with broad ability to prevent or impede many different problems.
</p>
]]></content></entry><entry><title>Release It Second Edition in Beta</title><link href="https://michaelnygard.com/blog/2017/08/release-it-second-edition-in-beta/"/><id>https://michaelnygard.com/blog/2017/08/release-it-second-edition-in-beta/</id><published>2017-08-24T08:09:51-05:00</published><updated>2017-08-24T08:09:51-05:00</updated><content type="html"><![CDATA[
<p>
I’m excited to announce the beta of <a href="https://pragprog.com/book/mnee2/release-it-second-edition">Release It! Second edition</a>.
</p>

<p>
It’s been ten years since the first edition was released. Many of the
lessons in that book hold strong. Some are even more relevant today
than in 2007. But a few things have changed. For one thing, capacity
management is much less of an issue today. The rise of the cloud means
that developers are more exposed to networks than ever. And in this
era of microservices, we’ve got more and better ops tools in the open
source world than ever.
</p>

<p>
All of that motivated me to update the book for this decade. I’ve
removed the section on capacity and capacity optimization and replaced
it with a section that builds up a picture of our systems by doing a
“slow zoom” out from the hardware, to single processes, to clusters,
to the controlling infrastructure, and to security issues.
</p>

<p>
The first beta does not yet include two additional new parts on
deployment and solving systemic problems. Those will be coming in the
next few weeks.
</p>

<p>
In the meanwhile, I look forward to hearing your comments and
feedback! Join the conversation in the <a href="https://forums.pragprog.com/forums/428">book's forum</a>.
</p>
]]></content></entry><entry><title>Spectrum of Change</title><link href="https://michaelnygard.com/blog/2017/06/spectrum-of-change/"/><id>https://michaelnygard.com/blog/2017/06/spectrum-of-change/</id><published>2017-06-23T11:26:33-05:00</published><updated>2017-06-23T11:26:33-05:00</updated><content type="html"><![CDATA[<p>I&rsquo;ve come to believe that every system implicitly defines a spectrum
of changes, ordered by their likelihood. As designers and developers,
we make decisions about what to embody as architecture, code, and data
based on known requirements and our experience and intuition.</p>
<p>We pick some kinds of changes and say they are so likely that we
should represent the current choice as data in the system. For
instance, who are the users? You can imagine a system where the user
base is so fixed that there&rsquo;s no data representing the user or
users. Consider a single-user application like a word processor.</p>
<p>Another system might implicitly indicate there is just one community
of users. So there&rsquo;s no data that represents an organization of
users&ndash;it&rsquo;s just implicit. On the other hand, if you&rsquo;re building a
SaaS system, you expect the communities of users to come and
go. (Hopefully, more come than go!) So you make whole communities into
data because you expect that population to change very rapidly.</p>
<p>If you are building a SaaS system for a small, fixed market you might
decide that the population <em>won&rsquo;t</em> change very often. In that case,
you might represent a population of users in the architecture via
instancing.</p>
<p>So data is at the high-energy end of the spectrum, where we expect
constant change. Next would be decisions that are contemplated in code
but only made concrete in configuration. These aren&rsquo;t quite as easy to
change as data. Furthermore, we expect that only one answer to any
given configuration choice is operative at a time. That&rsquo;s in contrast
to data where there can be multiple choices active simultaneously.</p>
<p>Below configuration are decisions represented explicitly in
code. Constructs like policy objects, strategy patterns, and plugins
all indicate our belief that the answer to a particular decision will
change rapidly. We know it is likely to change, so we localize the
current answer to a single class or function. This is the origin of
the &ldquo;Single Responsibility Principle.&rdquo;</p>
<p>Farther down the spectrum, we have cross-cutting behavior in a single
system. Logging, authentication, and persistence are the typical
examples here. Would it be meaningful to say push these up into a
higher level like configuration? What about data?</p>
<p>Then we have those things which are so implicit to the service or
application that they aren&rsquo;t even represented. Everybody has a story
about when they had to make one of these explicit for the first
time. It may be adding a native app to a Web architecture, or going
from single-currency, single-language to multinational.</p>
<p>Next we run into things that we expect to change very rarely. These
are cross-cutting behavior across multiple systems. Authentication
services and schemas often land at this level.</p>
<p>So the spectrum goes like this, from high energy, rapidly changing,
blue to cool, sedate red:</p>
<ul>
<li>Data</li>
<li>Configuration</li>
<li>Encapsulated code</li>
<li>Cross-cutting code</li>
<li>Implicit in application</li>
<li>Cross-cutting architecture</li>
</ul>
<h2 id="implications">Implications</h2>
<p>The farther toward the &ldquo;red&rdquo; end of the spectrum we relegate a
concern, the more tectonic it will be to change it.</p>
<p>No particular decision &ldquo;naturally&rdquo; falls at one level or another. We
just have experience and intuition about which kinds of changes happen
with greatest frequency. That intuition isn&rsquo;t always right.</p>
<p>Efforts to make everything into data in the system lead to rules
engines and logic programming. That doesn&rsquo;t usually end up with the
end-user control we think. It turns out you still need programmers to
think through changes to rules in a rules engine. Instead of
democratizing the changes, you&rsquo;ve made them more esoteric.</p>
<p>It&rsquo;s also not feasible to hoist everything up to be data. The more
decisions you energy-boost to that level, the more it costs. And at
some point you generalize enough that all you&rsquo;ve done is create a new
programming language. If everything about your application is data,
you&rsquo;ve written an interpreter and recursed one level higher. Now you
still have to decide how to encode everything in that new language.</p>
]]></content></entry><entry><title>Queuing for QA</title><link href="https://michaelnygard.com/blog/2017/05/queuing-for-qa/"/><id>https://michaelnygard.com/blog/2017/05/queuing-for-qa/</id><published>2017-05-01T20:40:29-05:00</published><updated>2017-05-01T20:40:29-05:00</updated><content type="html"><![CDATA[
<p>
Queues are the enemy of high-velocity flow. When we see them in our
software, we know they will be a performance limiter. We should look
at them in our processes the same way.
</p>

<p>
I've seen meeting rooms full of development managers with a chart of
the year, trying to allocate which week each dev project will enter
the QA environment. Any project that gets done too early just has to
wait its turn in QA. Entry to QA becomes a queue of its own. And as
with any queue, the more variability in the processing time, the more
likely the queue is to back up and get delayed.
</p>

<p>
When faced with a situation like that, the team may look for the
``right number'' of QA environments to build. There is no right
number. Any fixed number of environments just changes the queuing
equation but keeps the queue. A much better answer is to change the
rules of the game. Instead of having long-lived (in other words,
broken and irreproducible) QA environments, focus on creating a
machine for stamping out QA environments. Each project should be able
to get its own disposable, destructible QA system, use it for the
duration of a release, and discard it.
</p>
]]></content></entry><entry><title>Availability and Stability</title><link href="https://michaelnygard.com/blog/2016/11/availability-and-stability/"/><id>https://michaelnygard.com/blog/2016/11/availability-and-stability/</id><published>2016-11-27T13:26:17-06:00</published><updated>2016-11-27T13:26:17-06:00</updated><content type="html"><![CDATA[
<p>
<a href="http://www.michaelnygard.com/blog/2016/11/fault-error-failure">Last post</a> covered technical definitions of <b>fault</b>, <b>error</b>, and
<b>failure</b>. In this post we will apply these definitions in a system.
</p>

<p>
Our context is a long-running service or server. It handles requests
from many different consumers. Consumers may be human users, as in the
case of a web site, or they may be other programs.
</p>

<p>
Engineering literature has many definitions of "availability." For our
purpose we will use observed availability. That is the probability
that the system survives between the time a request is submitted and
the time it is retired. Mathematically, this can be expressed as the
probability that the system does not fail between time T_0 and T_1,
where the difference T_1 - T_0 is the request latency.
</p>

<p>
(There is a subtle issue with this definition of observed
availability, but we can skirt it for the time being. It intrinsically
assumes there is some other channel by which we can detect failures in
the system. In a pure message-passing network such as TCP/IP, there is
no way to distinguish between "failed" and "really, really slow." From
the consumer's perspective, "too slow" <i>is</i> failed.)
</p>

<p>
The previous post established that faults will occur. To maintain
availability, we must prevent faults from turning into failures. At
the component level, we may apply fault-tolerance or
fault-intolerance. Either way, we must assume that components will
lose availability.
</p>

<p>
<b>Stability</b>, then, is the architectural characteristic that allows a
system to maintain availability in the face of faults, errors, and
partial failures.
</p>

<p>
At the system level, we can create stability by applying the
principles of recovery-oriented computing.
</p>

<ol class="org-ol">
<li><i>Severability</i>. When a component is malfunctioning, we must be able
to cut it off from the rest of the system. This must be done
dynamically at runtime. That is, it must not require changes to
configuration or rebooting of the system as a whole.
</li>
<li><i>Tolerance</i>. Components must be able to absorb "shocks" without
transmitting or amplifying them. When a component depends on a
another component which is failing or severed, it must not exhibit
higher latency or generate errors.
</li>
<li><i>Recoverability</i>. Failing components must be restarted without
restarting the entire system.
</li>
<li><i>Resilience</i>. A component may have higher latency or error rate
when under stress from partial failures or internal
faults. However, when the external or internal condition is
resolved, the component must return to its previous latency and
error rate. That is, it must display no lasting damage from the
period of high stress.
</li>
</ol>

<p>
Of these characteristics, recoverability may be the easiest to achieve
in today's architectures. Instance-level restarts of processes,
virtual machines, or containers all achieve recoverability of a
damaged components.
</p>

<p>
The remaining characteristics can be embedded in the code of a
component via Circuit Breakers, Bulkheades, and Timeouts.
</p>
]]></content></entry><entry><title>Fault, Error, Failure</title><link href="https://michaelnygard.com/blog/2016/11/fault-error-failure/"/><id>https://michaelnygard.com/blog/2016/11/fault-error-failure/</id><published>2016-11-27T13:01:50-06:00</published><updated>2016-11-27T13:01:50-06:00</updated><content type="html"><![CDATA[
<p>
Our systems suffer many insults when they contact the real
world. Flaky inputs, unreliable networks, and misbehaving users, to
name just a few. As we design our components and systems to thrive in
the only environment that matters, it pays to have mental schema and
language to discuss the issues.
</p>

<p>
A <b>fault</b> is an incorrect internal state in your software. Faults are
often introduced at component, module, or subsystem boundaries. There
can be a mismatch between the contract a module is designed to
implement and its actual behavior. A very simple example is accepting
a negative integer or zero when a strictly positive integer was
expected.
</p>

<p>
A fault may also occur when a latent bug in the software is triggered
by an external or internal condition. For example, attempting to
allocate an object when memory is exhausted will return a null
pointer. If the software proceeds with the null pointer it can cause
problems later, perhaps in a far distant part of the code.
</p>

<p>
Such an incorrect state may be recoverable. A fault-tolerant module
will attempt to restore a good internal state after detecting a
fault. Exception handlers and error-checking code are efforts to
provide fault-tolerance.
</p>

<p>
Another school of thought says that fault tolerance is
unreliable. In this approach, once a fault has occurred, the entire
memory state of the program must be regarded as corrupt. Instead of
attempting to restore a good state by backtracking or patching up the
internal state, fault-intolerant modules will exit to avoid producing
errors. A system built from these fault-intolerant modules will include
supervisor capabilities to restart exited modules.
</p>

<p>
If a fault propagates in the system, it can produce visibly incorrect
behavior. This is an <b>error</b>. Faults may occur without producing
errors, as in the case of fault-tolerant modules that correct their
own state before an error is observed. An error may be limited to an
incorrect output displayed to a user. It can include any incorrect
behavior, including data loss or corruption, network flooding, or
launching attack drones.
</p>

<p>
At the component, module, or subsystem level, or mission is to prevent
<i>faults</i> from causing <i>errors</i>.
</p>

<p>
A <b>failure</b> results when a system terminates without completing its
job. For a long-running service or server, it stops responding to
requests in a finite time. For a program that should run to completion
and exit, it exits abnormally before completing. A failure may be
preferrable to an error, depending on the harm caused by the error.
</p>

<p>
Next time, I will address system stability in the face of faults,
errors, and failures.
</p>
]]></content></entry><entry><title>Power Systems</title><link href="https://michaelnygard.com/blog/2016/09/power-systems/"/><id>https://michaelnygard.com/blog/2016/09/power-systems/</id><published>2016-09-05T10:14:18-05:00</published><updated>2016-09-05T10:14:18-05:00</updated><content type="html"><![CDATA[
<p>
This is an excerpt from something I'm working on this Labor Day holiday:
</p>

<p>
&#x2013;
</p>

<p>
Large scale power outages act a lot like software failures. It starts
with a small event, like a power line grounding out on a
tree. Ordinarily that would be no big deal but under high-stress
conditions it can turn into a cascading failure that affects millions
of people.  We can also learn from how power gets restored after an
outage. Operators must perform a tricky balancing act between
generation, transmission, and demand.
</p>

<p>
There used to be a common situation where power would be restored and
then cut off again in a matter of seconds. It was especially common in
the American South, where millions of air conditioners and
refrigerators would all start at the same time. When a motor starts
up, it draws a lot of current. You can see this in the way that lights
dim when you start a circular saw. As the motor starts to spin,
though, it creates "back EMF"&#x2013;a kind of backpressure on the
electrical current. (That's when the lights return to full
brightness.) If you add up the effects of millions of electric motors
starting all at once, you see a huge upward blip in current draw,
followed by a quick drop due to back current. Power transmission
systems would see the spike and drop and propagate that to the
generation systems. First they would increase their draw then drop it
dramatically. That would make the generation systems think they should
shut off some of the turbines. Right about the time they started
reducing supply, the initial surge of back EMF would decline and
current load would come back up to baseline levels. The increased
current load hit just when supply was declining, causing excess demand
to trip circuit breakers. Lights out, again.
</p>

<p>
Smarter appliances and more modern control systems have mitigated that
particular failure mode now, but there are still useful lessons for
us.
</p>
]]></content></entry><entry><title>Remember DAT?</title><link href="https://michaelnygard.com/blog/2016/07/remember-dat/"/><id>https://michaelnygard.com/blog/2016/07/remember-dat/</id><published>2016-07-29T13:18:11-05:00</published><updated>2016-07-29T13:18:11-05:00</updated><content type="html"><![CDATA[
<p>
Do you remember Digital Audio Tape? DAT was supposed to have all the
advantages of digital audio&#x2014;high fidelity and perfect
reproduction&#x2014;plus the "advantages" of tape. (Presumably those
advantages did not include melting on the dashboard of your Chevy
Chevelle or spontaneously turning into The Best of Queen after a
fortnight.)
</p>

<p>
In hindsight, we can see that DAT was a twilight product. As the sun
set on the cassette era, DAT was an attempt to bridge the
discontinuous technology change to digital music production. It was a
twilight product because it didn't sufficiently reimagine the existing
technology to offer enough of a new advantage nor did it eliminate
enough of the old disadvantages.
</p>

<p>
We often see such twilight products right before a major,
discontinuous shift.
</p>

<p>
I think we're in such a period when it comes to software development
and deployment for cloud native systems. The tools we have attempt to
take the traditional model into the new environment. But they don't
yet reimagine the world of software development enough. Ten years from
now, we will see that they offered some advantages but also carried
forward baggage from the client-server era. Unix-like full operating
systems, coding one process at a time, treating network as a secondary
concern, ignoring memory hierarchy in the languages.
</p>

<p>
Whatever the "operating system for the cloud" is, we haven't seen it
yet.
</p>
]]></content></entry><entry><title>QA Instability Implies Production Instability</title><link href="https://michaelnygard.com/blog/2016/07/qa-instability-implies-production-instability/"/><id>https://michaelnygard.com/blog/2016/07/qa-instability-implies-production-instability/</id><published>2016-07-14T10:06:11-05:00</published><updated>2016-07-14T10:06:11-05:00</updated><content type="html"><![CDATA[
<p>
Many companies that have trouble delivering software on time exhibit a
common pathology. Developers working on the next release are
frequently interrupted for production support issues with the current
release. These interrupts never appear in project schedules but can
take up half of the developers' hours. When you include the cost of
task-switching, this means less than half of their available time is
spent on the new feature work.
</p>

<p>
Invariably, when I see a lot of developer effort in production support
I also find an unreliable QA environment. It is both unreliable in
that it is frequently not available for testing, and unreliable in the
sense that the system's behavior in QA is not a good predictor of its
behavior in production.
</p>

<p>
QA environments are often configured differently than production. Not
just in the usual sense of consuming a QA version of external
services, but also in more basic ways. Server topology may be
different. Memory settings, capacity ratios, and the presence of
network components can all vary. QA often has a much simpler traffic
routing scheme than production, particularly when a CDN is involved.
</p>

<p>
The other major source of QA unavailability has to do with data
refreshes. QA environments either run with a miniscule, curated test
data set, or they use some form of backward migration from production
data. Each backward migration can be very disruptive, leading to one
or more days where QA is not available.
</p>

<p>
Disruption arises when testers have to do manual data setup in order
to test new features. These setups get overwritten with the next
refresh. Sometimes, production data must be cleansed or scrubbed of
PII before use in QA. This cleansing process often introduces its own
data quality problems. The backward migration process must also be
kept up to date so it can propagate data back into the schema for the
next release. This requires copying data and schema into QA, then
forward-migrating the schema according to the new release.
</p>

<p>
When many teams contend to get into a QA environment, that contention
can result in lost time as well. Time is lost in delays when one team
cannot move their code into QA during another team's test. It is also
lost when one team overwrites test data that a different team had set
up. And it can be lost when one team's code has bugs that prevent
other teams from proceeding with their tests. Suppose one team
works on login and registration, while another team works on friend
requests. Clearly, the friend requests team cannot do their testing
when login is broken. This last issue also applies across service
boundaries: a service consumer may not be able to test because the QA
version of their service provider is broken.
</p>

<p>
Finally, problems in QA simply take a lower priority than problems in
production. Thus, the operations team may be fully consumed with
current production issues, leaving the QA environment broken for
extended periods. In a vicious feedback loop, this makes it likely
that the next release will also create production problems.
</p>

<p>
My recommendations are these:
</p>

<ul class="org-ul">
<li>Give priority to well-functioning test environments.
</li>
<li>Virtualize your test environments, so you can avoid inter-team
dependencies on a QA environment.
</li>
<li>Automate the backward data propagation, and make it part of spinning
up a QA environment. When you must scrub PII, automate that process
so that every QA environment can draw from a snapshot of cleansed
data without impinging on the production DBAs.
</li>
<li>If your QA stays unavailable because there are too many production
issues, recognize that this is a self-sustaining pattern. You can
temporarily redirect a "SWAT" team to fix QA and it will pay
dividends for all future releases.
</li>
</ul>
]]></content></entry><entry><title>Wittgenstein and Design</title><link href="https://michaelnygard.com/blog/2016/07/wittgenstein-and-design/"/><id>https://michaelnygard.com/blog/2016/07/wittgenstein-and-design/</id><published>2016-07-10T09:27:27-05:00</published><updated>2016-07-10T09:27:27-05:00</updated><content type="html"><![CDATA[
<p>
What does a philosopher born in the 19th Century have to say about
software design? More than you might think, particularly his ideas
about family resemblance.
</p>

<p>
Wittgenstein used the subject of "games" to illustrate an idea. We'll
start with a counter-example. Suppose we operate with the
then-prevailing notion that words are defined like sets in axiomatic
set theory. Then there is a decision procedure that will let us decide
whether something is a member of the set "games" or not. Such a
decision procedure must include everything that is a game and exclude
everything which is not a game. Can we define such a decision
procedure?
</p>

<p>
Does a game require competition? Some do. Not all.
</p>

<p>
Does a game have a score? Or an objective? Not all.
</p>

<p>
Does a game involve more than one person? Not necessarily.
</p>

<p>
Is a game a frivolous expenditure of energy? Some are. Others have
deep moral and philosophical lessons.
</p>

<p>
How is a game of football like a game of solitaire?
</p>

<p>
It's easy to see that mancala and go have something in
common&#x2026; little rounded stones. But what do they have in common with
Minecraft? Stones?
</p>

<p>
Wittgenstein said that this is not an issue for set theory. Instead,
he talked about <a href="https://en.wikipedia.org/wiki/Family_resemblance">family resemblances</a>. As described in Wikipedia,
"things which could be thought to be connected by one essential common
feature may in fact be connected by a series of overlapping
similarities, where no one feature is common to all."
</p>

<p>
For games, this means there is no single feature that makes something
a game. Instead, there are a set of overlapping similarities that make
things more gamelike or less gamelike. We can even think about things
that share more of the features as being more like each other. So go
and mancala share features like: two players, stones on a board,
alternating turns, one winner, ancient, cerebral, positional. This
makes them pretty similar. A professionally played team sport with a
ball on a field shares few qualities with go. (Although "people
excited about the outcome" and "positional" might be common.) So the
feature-distance between go and football is large, yet they are both
still games.
</p>

<p>
I think this relates to the tasks of software design and
architecture. We have a strong tendency to go looking for nouns in our
designs. Once we find a noun in a domain, we want to make a software
artifact that captures all members of the set induced by that
noun. But that only works if we stick with axiomatic set theory. Set
theory works well for well-defined technical concepts and much less
well for things in the human sphere.
</p>

<p>
One simple example, the humble "name" field. Go read
<a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/">Falsehoods
programmers believe about names</a>. How do you feel about that "first
name", "last name" database structure now? After reading that list,
how much can you confidently say about instances of a "Name" class? Or
a "Name" service?
</p>

<p>
We have all these debates about "noun-first" or "verb-first". Back in
<a href="http://www.michaelnygard.com/blog/2015/04/the-perils-of-semantic-coupling/">The Perils of Semantic Coupling</a>, I argued for a behavior-oriented
approach rather than a noun-oriented approach. Stop saying "what is this
thing?" but rather "what can you do with it?" That leads us toward
segregated interfaces.
</p>

<p>
Now I'd augment that to emphasize those feature descriptions rather
than noun-like descriptions. Instead of noun-first or verb-first I'm
going to try "adjective-first".
</p>
]]></content></entry><entry><title>In Love With Your Warts</title><link href="https://michaelnygard.com/blog/2016/04/in-love-with-your-warts/"/><id>https://michaelnygard.com/blog/2016/04/in-love-with-your-warts/</id><published>2016-04-08T14:24:17-04:00</published><updated>2016-04-08T14:24:17-04:00</updated><content type="html"><![CDATA[
<p>
If you help other people solve problems, you may have run into this
phenomenon: a person gleefully tells you how messed up their
environment is. Processes make no sense, roadblocks are everywhere,
and all previous attempts to make things better have failed. As you
explore the details, you talk about directions to try. But every
suggestion is met with an explanation of why it won't work.
</p>

<p>
I say that these folks are "in love with their own warts." That is,
they know there's a problem, but they've somehow been acclimated to it
to such a degree that they can't imagine a different world. They will
consistently point to outside agents as the author of their woes,
without realizing how much resistance they generate themselves.
</p>

<p>
Over time, by the way, there's a reinforcing process. People who think
and talk this way will cluster and drive out the less cynical.
</p>

<p>
These people can be intensely frustrating to work with, until you
understand them. Understanding allows empathy, which is the only way
to get past that self-generated resistance.
</p>

<p>
The first thing to understand is that any conversation about their
problems isn't really about their problems. An opening statement like,
"We tried that but it didn't work," isn't really asking for a
solution. Instead, it's an invitation to play a game. That game is
called, "Stump the expert." The player wins when you concede that
nothing can ever improve. You "win" by suggesting something that the
player cannot find an objection to. It's not a real victory though,
for reasons that will be clear in a moment.
</p>

<p>
Why does the player want to win this game instead of improving their
world? For one thing, any solution you find is an implicit critique of
the person who has been there. Suppose the solution is to shift a
responsibility from one team to another. That requires management
support in both teams. If that solution works, then it means the
game-player <i>could have</i> produced the same improvement ages ago, but
didn't have enough courage to make it happen. Other changes might
imply the game-player lacked sufficient authority, vision,
credibility, or, rarely, technical acumen.
</p>

<p>
In every case, the game-player feels that your solution highlights a
deficiency of theirs.
</p>

<p>
This is why "winning" the discussion isn't really a win. You may get a
grudging concession about the avenue to explore, but you're still
generating more resistance from that game-player.
</p>

<p>
My usual approach is to decline the invitation to the game. I don't
try to find point-by-point answers to things that have failed in the
past. I usualy draw analogies to other organizations that have faced
the same challenges and make parallels to their solutions. Failing
that, I accept the objections (almost always phrased as roadblocks
thrown up by others) and just tell them, "Let me handle that." (Most
of the time, I find that people on the opposite side of a boundary
express roadblocks from the other side that all eventually cancel each
other out. That is, the roadblock turns out to be illusory.)
</p>

<p>
I'd like to hear from you, dear Reader. Assume that you cannot simply
fire or transfer the game-player. They have value beyond this
particular habitual reflex.
</p>

<p>
How would you handle a situation like this?  What have you tried, and
what works?
</p>
]]></content></entry><entry><title>Some Useful Techniques From Bygone Eras</title><link href="https://michaelnygard.com/blog/2016/03/some-useful-techniques-from-bygone-eras/"/><id>https://michaelnygard.com/blog/2016/03/some-useful-techniques-from-bygone-eras/</id><published>2016-03-02T13:06:44-06:00</published><updated>2016-03-02T13:06:44-06:00</updated><content type="html"><![CDATA[
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">CRC</h2>
<div class="outline-text-2" id="text-1">
<p>
I find the old object-oriented design technique of <a href="http://c2.com/cgi/wiki?CrcCard">CRC Cards</a> to be
useful when defining service designs. CRC is short for "Class,
Responsibilities, Collaborators." It's a way to define what behavior
belongs inside a service and what behavior it should delegate to other
services.
</p>

<p>
Simulating a system via CRC is a good exercise for a group. Each
person takes a CRC card and plays the role of that service. A request
starts from outside the services and enters with one person. They can
do only those things written down as their "responsibilities." For
anything else, they must send a message to someone else.
</p>

<p>
Personifying and role-playing really helps identify gaps in service
design. You'll find gaps where data or entire services are needed but
don't exist.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Tell, Don't Ask</h2>
<div class="outline-text-2" id="text-2">
<p>
The more services you have, the more operational complexity you take
on. Some patterns of service design seem to encourage high coupling
and a high fan-in to a small number of critical services. (Usually,
these are "entity" services&#x2026; i.e., CRUDdy REST over a database
table.)
</p>

<p>
Instead, I find it better to tell services what you want them to
do. Don't ask them for information, make a decision, then change some
state.
</p>

<p>
Organizing around Tell, Don't Ask leads you to design services around
behavior instead of data. You'll probably find you denormalize your
data to make T,DA work. That's OK. The runtime benefit of cleaner
coupling will be worth it.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Data Flow Diagrams</h2>
<div class="outline-text-2" id="text-3">
<p>
If you ask someone who isn't trained in UML to draw a system's
architecture, they will often draw something close to a
<a href="https://en.wikipedia.org/wiki/Data_flow_diagram">Data Flow Diagram</a>. This diagram shows data repositories and the
transformation processes that populate them. DFDs are a very helpful
tool because they force you to ask a few really key questions:
</p>

<ol class="org-ol">
<li>Where did that information come from?
</li>
<li>How did it get there?
</li>
<li>Who updates it?
</li>
<li>Who uses the data we produce?
</li>
</ol>

<p>
In particular, answering that last question forces you to think about
whether you're producing the right data for the downstream consumer.
</p>
</div>
</div>
]]></content></entry><entry><title>Generalized Minimalism</title><link href="https://michaelnygard.com/blog/2016/02/generalized-minimalism/"/><id>https://michaelnygard.com/blog/2016/02/generalized-minimalism/</id><published>2016-02-29T10:22:43-06:00</published><updated>2016-02-29T10:22:43-06:00</updated><content type="html"><![CDATA[<p>My daily language is Clojure. One of the joys of working in Clojure is
its great core library. The core library has a wealth of functions
that apply broadly across data structures. A typical function looks
like this:</p>
<pre tabindex="0"><code>(defn nthnext
  &#34;Returns the nth next of coll, (seq coll) when n is 0.&#34;
  {:added &#34;1.0&#34;
   :static true}
  [coll n]
    (loop [n n xs (seq coll)]
      (if (and xs (pos? n))
        (recur (dec n) (next xs))
        xs)))
</code></pre><p>I want to call your attention to two specific forms. The &ldquo;seq&rdquo; function
works on any &ldquo;Seqable&rdquo; collection type. (N.B.: It has special cases
for other types, including some to make Java interop more
pleasant. But the core behavior is about Seqable.) The &ldquo;next&rdquo; function
is similar: it works on anything that already is a Seq or anything
that can be made into a Seq.</p>
<p>This provides a nice degree of abstraction and through that,
generality.</p>
<p>Pretty much all of the core data types either implement ISeq or
Seqable. That means I can call &ldquo;seq&rdquo;, &ldquo;next&rdquo;, and &ldquo;nth&rdquo; on any of
them. Other data types can be brought into the fold by extending one
of those interfaces to them. We extend the data to meet the core
functions, instead of overloading functions for data types.</p>
<h2 id="yagni-isnt-about-being-specific">YAGNI Isn&rsquo;t About Being Specific</h2>
<p>Under this approach, writing a general function is both simpler and
easier than writing a specific one.</p>
<p>For example, suppose I need to do that classic example of trivial
functionality: summing a list of integers. The most natural way for me
to write that is like this:</p>
<pre tabindex="0"><code>(reduce + 0 xs)
</code></pre><p>That is both simple and general. But it doesn&rsquo;t meet the spec I said!
It sums <em>any</em> numeric type, not just integers. If I decide that I
really must restrict it to integers, I have to add code.</p>
<pre tabindex="0"><code>(assert (every? integer? xs))
(reduce + 0 xs)
</code></pre><p>This is a pattern I find pretty often when working in Clojure. When I
generalize, I do it by removing special cases. This goes hand-in-hand
with decomposing behavior into smaller and smaller units. As each unit
gets smaller, I find it can be more general.</p>
<p>Here&rsquo;s a less trivial example. Today, I&rsquo;m working on a library we call
Vase. (See <a href="https://www.youtube.com/watch?v=BNkYYYyfF48">Paul deGrandis&rsquo; talk on data-driven
systems</a> for more about
Vase.) In particular, I&rsquo;m updating it to work with a new routing
syntax in <a href="https://pedestal.io">Pedestal</a>. With the new routing
syntax, we can build routes from ordinary Clojure data&ndash;no more need
for oddly-placed syntax-quoting.</p>
<p>One of the core concepts in Pedestal is the &ldquo;interceptor&rdquo;. They
fulfill the same role as middleware in Ring. (One difference:
interceptors are data structures that contain functions. Interceptors
compose by making a vector of data, whereas Ring middleware composes
by creating function closures. I find it easier to debug a stack of
data than a stack of opaque closures.) Any particular route in
Pedestal will have a list of interceptors that apply to that route.</p>
<p>When a service that uses Pedestal supplies interceptors, it composes a
list of them. Suppose I want to make a convenience function that helps
application developers build up that list. What would I need to do?</p>
<p>You probably already figured out that any such &ldquo;convenience&rdquo; functions
I could create would basically duplicate core functions, but with
added restrictions. Instead of &ldquo;cons&rdquo;, &ldquo;conj&rdquo;, &ldquo;take&rdquo;, and &ldquo;drop&rdquo;, I&rsquo;d
have to create &ldquo;icons&rdquo;, &ldquo;iconj&rdquo;, &ldquo;itake&rdquo;, and &ldquo;idrop&rdquo;. What a waste.</p>
<p>I have to ask myself, &ldquo;Do I need some special behavior here?&rdquo; And the
answer is &ldquo;YAGNI.&rdquo;</p>
<h2 id="yagni-is-about-adding-stuff">YAGNI Is About Adding &ldquo;Stuff&rdquo;</h2>
<p>YAGNI is commonly understood to mean &ldquo;don&rsquo;t generalize until you need
to.&rdquo; In some languages and libraries, I suppose that&rsquo;s the right
read. In my world, though, it is <em>specializing</em> that requires adding
stuff. So I often call YAGNI if someone tries to make a thing less
general than it could be.</p>
<p>Small functions that operate on abstractions instead of concrete
types are both general and simple.</p>
]]></content></entry><entry><title>Redeeming the Original Sin</title><link href="https://michaelnygard.com/blog/2016/02/redeeming-the-original-sin/"/><id>https://michaelnygard.com/blog/2016/02/redeeming-the-original-sin/</id><published>2016-02-12T10:24:38-06:00</published><updated>2016-02-12T10:24:38-06:00</updated><content type="html"><![CDATA[
<p>
While reading <a href="http://dtrace.org/blogs/bmc/">Bryan Cantrill's</a> slides from <a href="http://www.meetup.com/papers-we-love/">Papers We Love NYC</a>, I was
struck by something. One of the very first slides says:
</p>

<blockquote>
<p>
The traditional UNIX security model is simple but inexpressive.
</p>
</blockquote>

<p>
The papers go on to describe a progression of techniques to isolate
processes from the host environment to greater and greater degrees. It
began with the ancient precursor 'chroot', through Jails, and
Zones. Each builds upon the previous work to improve the degree of
isolation.
</p>

<p>
We've seen a parallel series of efforts in the Linux realm with
virtual machines and containers.
</p>

<p>
However!
</p>

<p>
All of these are introduced to restore the degree of isolation and
resource control that was originally present in mainframe operating
systems. Furthermore, it was the model that Multics was meant to
supply.
</p>

<p>
Unix started with a simplified security model, meant for single user
machines. It was "dumbed down" enough to be easy to implement on the
limited machines of the day.
</p>

<p>
Zones, VMs, containers&#x2026; they're all ways to redeem Unix from its
original sin. Maybe what we should look at is a better operating
system?
</p>
]]></content></entry><entry><title>What's Lost With a DevOps Team</title><link href="https://michaelnygard.com/blog/2016/01/whats-lost-with-a-devops-team/"/><id>https://michaelnygard.com/blog/2016/01/whats-lost-with-a-devops-team/</id><published>2016-01-27T15:03:28-06:00</published><updated>2016-01-27T15:03:28-06:00</updated><content type="html"><![CDATA[
<p>
Please understand, dear Reader, that I write this with positive
intention. I'm not here to impugn any person or organization. I want
to talk about some decisions and their natural consequences. These
consequences seem negative to me and after reading this post you may
agree.
</p>

<p>
When an established company faced a technology innovation, they often
create a new team to adopt and exploit that innovation. During my
career, I've seen this pattern play out with microcomputers,
client/server architecture, open systems, web development, agile
development, cloud architecture, NoSQL, and DevOps. Perhaps we can
explore the pros and cons of that overall approach in some other
post. For now, I want to specifically address the DevOps team.
</p>

<p>
A DevOps team gets created as an intermediary between development and
operations. This is especially likely when dev and ops report through
different management chains. That is to say, in a
functionally-oriented structure. In a product-oriented structure, it
is less likely.
</p>

<p>
This intermediary team gets tasked with automating releases and
deployments. They are the ones to adopt some code-as-configuration
platform. Sometimes they are also tasked with building an internal
platform-as-a-service, but that more often falls to the infrastructure
and operations teams.
</p>

<p>
So the devops team has development as their customer. Operations has
the devops team as their customer. Work flows from development,
through the tools created by the devops team, and into production. It
would seem to capture the benefits of automation: it becomes
predictable, repeatable, and safe.
</p>

<p>
All of that is true. However, even though this is an improvement, it
misses out on even greater improvements that could be realized.
</p>

<p>
The key problem is the unclosed feedback loop. When developers are
directly exposed to production operations, they learn. Sometimes they
learn from negative feedback: getting woken up for support calls,
debugging performance problems, or that horrible icy feeling in your
stomach when you realize that you just shut down the wrong database in
production.
</p>

<p>
With a DevOps team sitting between development and operations, the
operations team remains in the "learning position." But they lack the
ability to directly improve the systems. Suppose a log message is
ambiguous. If the operator who sees it can't directly change the
source code, then the message will never get corrected. (It's
important, but small&#x2026; exactly the thing least likely to be worth
filing a change request for.)
</p>

<p>
Over longer time spans, the things we learn from production should
influence the entire architecture: from technology choices to code
patterns and common libraries. A DevOps team sitting between
development and operations impedes that learning.
</p>

<p>
DevOps is meant to be a style of interaction: direct collaboration
between development and operations. A team in between that automates
things is a tools team. It's OK to call it a tools team. Tools are a
good thing, despite what corporate budgeting seems to say these
days.
</p>

<p>
Instead of creating a flow from development to DevOps to operations,
consider putting development, tools, and operations all together and
giving them the same goals. They should be collaborators working
shoulder-to-shoulder rather than work stations in a software factory.
</p>
]]></content></entry><entry><title>Give Them The Button!</title><link href="https://michaelnygard.com/blog/2015/10/give-them-the-button/"/><id>https://michaelnygard.com/blog/2015/10/give-them-the-button/</id><published>2015-10-23T10:37:45-05:00</published><updated>2015-10-23T10:37:45-05:00</updated><content type="html"><![CDATA[
<p>
Here's a syllogism for you:
</p>

<ul class="org-ul">
<li>Every technical review process is a queue
</li>
<li>Queues are evil
</li>
<li>Therefore, every review process is evil
</li>
</ul>

<p>
Nobody likes a review process. Teams who have to go through the review
look for any way to dodge it. The reviewers inevitably delegate the
task downward and downward.
</p>

<p>
The only reason we ever create a review process is because we think
someone else is going to feed us a bunch of garbage. They get created
like this:
</p>

<img src="/images/blog/2015-10-23-give-them-the-button/blame-loop.png">
<div class="caption"></div>


<p>
It starts when someone breaks a thing that they can't or aren't
allowed to fix. The responsibility for repair goes to a different
person or group. That party shoulders both responsibility for fixing
the thing and also blame for allowing it to get screwed up in the
first place.
</p>

<p>
(This is an unclosed feedback loop, but it is very common. Got a
separate development and operations group? Got a separate DBA group
from development or operations? Got a security team?)
</p>

<p>
As a followup, to ensure "THIS MUST NEVER HAPPEN AGAIN" the
responsible party imposes a review process.
</p>

<p>
Most of the time, the review process succeeds at preventing the same
kind of failure from recurring. The resulting dynamic looks like this:
</p>

<img src="/images/blog/2015-10-23-give-them-the-button/review-in-place.png">
<div class="caption"></div>


<p>
The hidden cost is the time lost. Every time that review process has
to go off, the creator must prepare secondary artifacts: some kind of
submission to get on the calendar, a briefing, maybe even a
presentation. All of these are non-value-adding to the end
customer. Muda. Then there's the delay on the review meeting or email
itself. Consider that there is usually not just one review but several
needed to get a major release out the door and you can see how release
cycles start to stretch out and out.
</p>

<p>
Is there a way we can get the benefit of the review process without
incurring the waste?
</p>

<p>
Would I be asking the question if I didn't have an answer?
</p>

<p>
The key is to think about what the reviewer actually does. There are
two possibilities:
</p>

<ol class="org-ol">
<li>It's purely a paperwork process. I'll automate this away with a
script that makes PDF and automatically emails it to whomever
necessary. Done.
</li>
<li>The reviewer applied knowledge and experience to look for harmful
situations.
</li>
</ol>

<p>
Let's talk mostly about the latter case. A lot of our technology has
land mines. Sometimes that is because we have very general purpose
tools available. Sometimes we use them in ways that would be OK in a
different situation but fail in the current one. Indexing an RDBMS
schema is a perfect example of this.
</p>

<p>
Sometimes, it's also because the creators just lack some experience or
education. Or the technology just has giant, truck-sized holes in it.
</p>

<p>
Whatever the reason, we expect that the reviewer is adding
intelligence, like so:
</p>

<img src="/images/blog/2015-10-23-give-them-the-button/injecting-experience.png">
<div class="caption"></div>


<p>
This benefits the system, but it could be much better. Let's look at
some of the downsides:
</p>

<ul class="org-ul">
<li>Throughput is limited to the reviewer's bandwidth. If they truly
have a lot of knowledge and experience, then they won't have much
bandwidth. They'll be needed elsewhere to solve problems.
</li>
<li>The creator learns from the review meetings&#x2026; by getting dinged for
everything wrong. Not a rewarding process.
</li>
<li>It is vulnerable to the reviewer's availability and presence.
</li>
</ul>

<p>
I'd much rather see the review codify that knowledge by building it
into automation. Make the automation enforce the practices and
standards. Make it smart enough to help the creator stay out of
trouble. Better still, make it smart enough to help the creator solve
problems successfully instead of just rejecting low quality inputs.
</p>

<img src="/images/blog/2015-10-23-give-them-the-button/double-loop-learning.png">
<div class="caption"></div>


<p>
With this structure, you get much more leverage from the responsible
party. Their knowledge gets applied across every invocation of the
process. Because the feedback is immediate, the creator can learn much
faster. This is how you build organizational knowledge.
</p>

<p>
Some technology is not amenable to this kind of automation. For
example, parsing some developer's DDL to figure out whether they've
indexed things properly is a massive undertaking. To me, that's a
sufficient reason to either change how you use the technology or just
change technology. With the DDL, you could move to a declarative
framework for database changes (e.g., Liquibase). Or you could use
virtualization to spin up a test database, apply the change, and see
how it performs.
</p>

<p>
Or you can move to a <a href="http://www.datomic.com/">database</a> where the schema is itself data,
available for query and inspection with ordinary program logic.
</p>

<p>
The automation may not be able to cover 100% of the cases in
general-purpose programming. That's why local context is important. As
long as there is at least one way to solve the problem that works with
the local infrastructure and automation, then the problem can be
solved. In other words, we can constrain our languages and tools to
fit the automation, too.
</p>

<p>
Finally, there may be a need for an exception process, where the
automation can't decide whether something is viable or not. That's a
great time to get the responsible party involved. That review will
actually add value because every party involved will learn. Afterward,
the RP may improve the automation or may even improve the target
system itself.
</p>

<p>
After all, with all the time that you're <b>not</b> spending in pointless
reviews, you have to find something to do with yourself.
</p>

<p>
Happy queue hunting!
</p>
]]></content></entry><entry><title>C9D9 on Architecture for Continuous Delivery</title><link href="https://michaelnygard.com/blog/2015/10/c9d9-on-architecture-for-continuous-delivery/"/><id>https://michaelnygard.com/blog/2015/10/c9d9-on-architecture-for-continuous-delivery/</id><published>2015-10-18T09:29:01-05:00</published><updated>2015-10-18T09:29:01-05:00</updated><content type="html"><![CDATA[
<p>
Every single person I've heard talk about Continuous Delivery says you
have to change your system's architecture to succeed with it. Despite
that, we keep seeing "lift and shift" efforts. So I was happy to be
invited to join a panel to discuss architecture for Continuous
Delivery. We had an online discussion last Tuesday on the <a href="http://electric-cloud.com/lp/continuous-discussions/">C9D9</a> series,
hosted by <a href="http://electric-cloud.com/powering-continuous-delivery/">Electric Cloud</a>.
</p>

<p>
They made the recording available immediately after the panel, along
with a shiny new embed code.
</p>

<iframe width="560" height="315"
src="https://www.youtube.com/embed/Vi-AdegLC_4" frameborder="0"
allowfullscreen></iframe>

<p>
Best of all, they supplied a transcript, so I can share some excerpts
here. (Lightly edited for grammar, since I have relatives who are
editors and I must face them with my head held high.)
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Pipeline Orchestration</h2>
<div class="outline-text-2" id="text-1">
<p>
It's easy to focus on the pipeline as the thing that delivers code
into production. But I want to talk about two other central roles that
it plays. One, with regards to risk management. To me the pipeline is
not so much about ushering code out to production, but it's about
finding every opportunity to reject a harmful change, or a bad change
prior to let it get into production. So I view the pipeline as an
essential part of risk management.
</p>

<p>
I've also had a lot of lean training, so I'd look on the deployment
pipeline as the value stream that developers use to deliver value to
their customers. In that respect we need to think about the pipeline
as production-grade infrastructure, and we need to treat it with
production-like SLAs.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Cattle, Not Pets</h2>
<div class="outline-text-2" id="text-2">
<p>
I think a lot has been said about "cattle versus pets" over the last
ten years or so. I just want to add one thing - the real challenge is
identity. There are ton of systems and frameworks that implicitly
assume stable identity on machines. Particularly a lot of distributed
software toolkits. When you do have the cattle model, a machine identity
may disappear and never come back again. I just really hope you're
not building up a queue of undelivered messages for that machine.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Service Orientation and Decoupling</h2>
<div class="outline-text-2" id="text-3">
<p>
Having teams running in parallel and being able develop more or less
independently - I talk about team scale autonomy. But if there are
very long builds, large artifacts and large number of artifacts, I
regard that as the consequence of using languages and tools that are
early bound and early linked. I don’t think it's any accident that the
people I heard of first doing continuous delivery were using PHP. You
can regard each PHP file as its own deployable artifact, and so things
move very quickly. If everything we wrote was extremely late bound, then
our deployment would be an <code>rsync</code> command. So to an extent, breaking
things down into services is a response to large artifacts, long build
times, that's one side of that.
</p>

<p>
The other side is team scale autonomy and the fact that you can't beat
Conway’s Law and that absolutely holds true. (Conway’s
Law: an organization is constrained to produce software that
recapitulates the structure of the organization itself. If you have
four teams working on a compiler, you're going to have a four pass
compiler.)
</p>

<p>
Now, when we talk about decoupling, I need to talk about two different
types of decoupling, both important.
</p>

<p>
The bigger your team gets, the more communication overhead goes up. We
have known that since the 1960s, so breaking that down makes
sense. But then we have to recompose things at runtime and that's when
coupling becomes a big issue. Operational coupling happens minute by
minute by minute. If I have service A calling service B, service B
goes down, I have to have some response. If I don't do anything else,
service A is also going to go down. So I need to build in some
mechanisms to provide operational decoupling, maybe that's a cache,
maybe it’s timeouts, maybe it's a circuit breaker, something along
those lines, to protect one service from the failure of another
service.
</p>

<p>
It's not just the failure of the service! A deployment to the other
service looks exactly like a failure from the perspective of the
consumer. It's simply not responding to request within an acceptable
time.
</p>

<p>
So we have to pay attention to the operational decoupling.
</p>

<p>
Semantic coupling is even more insidious, and that's what plays out
over a span of months and years. We talk about API versioning quite a
bit, but there other kinds of semantic coupling that creep in. I've
been harping a lot lately about identifiers. If I have to pass an
itemID to another system then I'm sort of implicitly saying there is
one universe of itemIDs and that system has them all, and I can only
talk to that system for items with those IDs.
</p>

<p>
Similarly with many services that we create, we create the service as
though there is one instance of the service. We'd be better off
creating the code that can instantiate that service many times for
many consumers. So if you create a calendar service, don’t make one
calendar that everyone has eventIDs on. Make a calendar service where
we can ask for a new calendar and it gives you back a URL for a whole
new calendar that is yours and only yours. This is the way you would
build it if you were building a SaaS business. That's how you would
need to think about the decoupled services internally.
</p>
</div>
</div>

<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Messaging and Data Management</h2>
<div class="outline-text-2" id="text-4">
<p>
If I'm truly deploying continuously then I've got version N and
version N+1 running against the same data source. So I need some way
to accommodate that. In older less-flexible kinds of databases, that
means triggers, shims, extra views, that kind of scaffolding.
</p>

<p>
 I heard a great a story, I think it's from Pinterest at Velocity a
couple of years back. They had started with a monolithic user database
and found they needed to split the table. After they already had 60
million users! But they were able to make many small deployments that
each added kind of one step for an incremental migration. And once
they got that in place, they let it sit for three months, at the end
of that they found who was left and did a batch migration of
those. Then they did a series of incremental deployments to remove the
extra data management stuff.
</p>

<p>
So it's one of those cases - doing continuous delivery both
necessitates that you're more sophisticated about your data changes,
but it also gives you new tools to accomplish those changes.
</p>

<p>
There are a wide crop of databases that don't require that kind of
care and feeding when you make deployments. If you are truly
architecting for operational ease and delivery, then that might be a
sufficient reason to choose one of the newer databases over one of the
less flexible relational stores.
</p>
</div>
</div>

<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Conclusion</h2>
<div class="outline-text-2" id="text-5">
<p>
The C9D9 discussion was quite enjoyable. The hosts ran the panel well,
and even though all of us are pretty long-winded, nobody was able to
filibuster. I'll be happy to join them again for another discussion
some time.
</p>
</div>
</div>
]]></content></entry><entry><title>Software Eats the World</title><link href="https://michaelnygard.com/blog/2015/10/software-eats-the-world/"/><id>https://michaelnygard.com/blog/2015/10/software-eats-the-world/</id><published>2015-10-03T18:10:31-04:00</published><updated>2015-10-03T18:10:31-04:00</updated><content type="html"><![CDATA[
<p>
During this morning's drive, I crossed several small overpasses. It
reminded me that the <a href="http://www.asce.org/">American Society of Civil Engineers</a> rated more
than 20% of our bridges as structurally deficient or functionally obsolete. That got me to thinking about how we even know how many
bridges there are in a country as large as the U.S.
</p>

<p>
Some time in the past, it would require an army of people to go survey
all the roads, looking for bridges and adding them to a ledger. Now,
I'm sure it's a query in a geographical database. The information had
to be entered at least once, but now that it's in the database we
don't need people to go wandering about with clicker counters.
</p>

<p>
Instead of clipboards and paper, the bridge survey needed data import
from thousands of state and county GIS databases. That means coders to
write the import jobs and DBAs to set up the target systems. It needed
queries to count up the bridges and cross-check with inspection
reports. So that requires more coders and maybe some UX designers for
data visualization.
</p>

<p>
Back in 2011, Marc Andreessen said "software is eating the world".
 There's no reason to think that's going to slow down soon. And
as software eats the world, work becomes tech work.
</p>
]]></content></entry><entry><title>Microservices versus Lean</title><link href="https://michaelnygard.com/blog/2015/08/microservices-versus-lean/"/><id>https://michaelnygard.com/blog/2015/08/microservices-versus-lean/</id><published>2015-08-11T06:38:51-05:00</published><updated>2015-08-11T06:38:51-05:00</updated><content type="html"><![CDATA[
<p>
Back in April, I had the good fortune to speak at Craft Conf in lovely
Budapest. It's a fantastic conference that I <a href="http://craft-conf.com/2016">would recommend</a>.
</p>

<p>
During that conference, Randy Shoup talked about his experience
migrating from monoliths to microservices at EBay and Google. David,
one of the audience members asked an interesting question at the end
of Randy's talk. (I'm sorry that I didn't get the full name of the
questioner&#x2026; if you are reading this, please leave a comment to let
me know who you are.)
</p>

<p>
"Isn't the concept of microservices contradictory with the lean/agile
principles of a) collective code ownership, and b) optimizing whole
processes and systems instead of small units?"
</p>

<p>
Randy already did a great job of responding to the first part of that
question, so please view the video to hear his answer there. He didn't
have time to respond to the second part so I don't know what <span class="underline">his</span>
answer would be, but I will tell you mine.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Start From The "Why"</h2>
<div class="outline-text-2" id="text-1">
<p>
Let's start by answering the question with a question. Why do we
pursue Lean development in the first place? Your specific answer may
vary, but I bet it relates back to "better use of capital" or "turning
ideas into profit sooner." Both of these are statements about
efficiency: efficient use of capital and efficient use of time.
</p>

<p>
One of the first Lean changes is to reorganize people and processes around
the value streams. That is a big upheaval! It often means moving from
a functional structure to a cross-functional structure. (And I <span class="underline">don't</span>
mean matrixing!) Just moving to that cross-functional structure will deliver big
improvements to cycle time and process efficiency. After that, teams
in each value stream can further optimize to reduce their cycle time.
</p>

<p>
The next focus is on reducing "inventory." For development, we
consider any unreleased code or stories to be inventory. So,
work-in-progress code, features that have been finished but not
deployed, and the entirety of the backlog all count as inventory.
</p>

<p>
Reducing inventory always has the effect of making more problems
visible. Maybe there are process bottlenecks to address, or maybe
there are high defect rates at certain steps (like failed deployments
to production, or a lot of rejected builds.)
</p>

<p>
This is the start of the real optimization loop: reduce the inventory
until a new problem is revealed. Solve the problem in a way that
allows you to further reduce inventory.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Which is the Value Stream?</h2>
<div class="outline-text-2" id="text-2">
<p>
David's question seems to originate from the view that the value
stream is the request handling process. So if a single request hits a
dozen services, then one value stream cuts across multiple
organizational boundaries. That would indeed be problematic.
</p>

<p>
However, I think the more useful viewpoint is that the value stream is
"the software delivery process" itself. This is based on the premise
that the value stream delivers "things customers would pay for." Well,
a customer wouldn't pay for a single request to be handled. They
would, however, pay for a whole new feature in your product.
</p>

<p>
Viewed that way, each service in production is the terminal point of
its own value stream. So, Lean does not conflict with a microservice
architecture. But could a microservice architecture conflict with
Lean?
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Return to "Why"</h2>
<div class="outline-text-2" id="text-3">
<p>
We asked, "Why Lean?" Now, let's ask "Why microservices?" The answer
is always "We want to preserve flexibility as we scale the
organization." Microservices are about embracing change at a
macroscopic level. That has nothing to do with capital efficiency!
</p>

<p>
So are these ideas contradictory? To answer that, I need to dig into
another aspect of Lean efforts: infrastructure.
</p>
</div>
</div>

<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Efficiency, Specialization, and Infrastructure</h2>
<div class="outline-text-2" id="text-4">
<p>
In the early days of aviation, airplanes were made of canvas and
wood. They could land at pretty much any meadow that didn't have cows
or sheep in the way. Pilots navigated by sight and landmarks, including
<a href="http://www.messynessychic.com/2013/11/15/the-forgotten-giant-arrows-that-guide-you-across-america/">giant concrete arrows on the ground</a>. Planes couldn't go very fast, fly
very high, carry many passengers, or haul a lot of cargo.
</p>

<p>
The maximum takeoff weight of an Airbus A380 is now
1.2 million pounds. It requires a specially reinforced runway of at
least 9,020 feet and typically carries 525 passengers. It flies at
an altitude of more than 8 miles. This is not an airplane that you
navigate by eyeballing landmarks.
</p>

<p>
This aircraft is amazingly efficient. Achieving that efficiency
requires extensive infrastructure. Radar on the plane and on the
ground. Multiple comms systems. An extensive array of radio beacons
and air traffic controllers on the ground and dozens of satellites in
space, all sending signals to the on-board network of flight
management systems. Billions of lines of code running across these
devices. Airports with jetbridges that have multiple connections to
the aircraft. Special vehicles to tow the plane, push the plane out,
haul bags, fuel, de-ice, remove waste water&#x2026; the list goes on and
on.
</p>

<p>
In short, this is <span class="underline">not</span> just an airplane. It is part of an elaborate air
transportation system.
</p>

<p>
It should be pretty obvious that the incredible efficiency of modern
airliners comes at the expense of flexibility. Not just in terms of
the individual aircraft, but in terms of changes to <span class="underline">any</span> part of the
whole system.
</p>

<p>
You can see this play out in any technological arena. As we increase
the systems' efficiency, we accumulate infrastructure that both
enables the efficient operation and also constrains the system to its
current mode of operation.
</p>

<p>
In Lean initiatives, there is a gradual shift from draining inventory
and solving existing problems into creating infrastructure to add
efficiency. It's not a bright line or a milestone to reach, but it is
noticeable. As you get further into the infrastructure-efficiency
realm, you must recognize two effects:
</p>

<ul class="org-ul">
<li>You will get better at certain actions.
</li>
<li>Other actions become much, much harder.
</li>
</ul>

<p>
As an example, suppose you are optimizing the value stream for
delivering applications. (A reasonable thing to do.) You will
eventually find that you need an automated way to move code into
production. You may choose to build golden master images, or automate
deployment via scripts, or use Docker to deploy the same configuration
everywhere. You may commit to VSphere, Xen, OpenStack, or whatever. As
you make these decisions, you make it easier to move code using the
chosen stack and much, much harder to do it any other way.
</p>
</div>
</div>

<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5">Full Circle</h2>
<div class="outline-text-2" id="text-5">
<p>
So, with all that background, I'm finally ready to address the
question of whether microservices and Lean are in conflict.
</p>

<p>
Given that:
</p>

<ol class="org-ol">
<li>You want maneuverability from microservices.
</li>
<li>Your value stream is delivering features into production.
</li>
<li>You pursue Lean past the inventory-draining phase.
</li>
<li>Further efficiency improvements require you to commit to infrastructure and
an extended system.
</li>
<li>That extended system will <span class="underline">not</span> be easy to change, no matter what
you choose or how you build it.
</li>
</ol>

<p>
Then the answer is "no."
</p>
</div>
</div>
]]></content></entry><entry><title>Development is Production</title><link href="https://michaelnygard.com/blog/2015/08/development-is-production/"/><id>https://michaelnygard.com/blog/2015/08/development-is-production/</id><published>2015-08-06T08:06:33-05:00</published><updated>2015-08-06T08:06:33-05:00</updated><content type="html"><![CDATA[
<p>
When I was at Totality, we treated an outage in our customers' content
management system as a Sev 2 issue. It ranked right behind "Revenue
Stopped" in priority. Content management is critical to the merchants,
copy writers, and editors. Without it, they cannot do their jobs.
</p>

<p>
For some reason, we always treated dev environment or QA environment
issues as a Sev 3 or 4, with the "when I get around to it" SLA. I've
come to believe that was incorrect.
</p>

<p>
The development environment and the QA environment are the critical
tools needed for developers to do their jobs. When an environment is
broken, it means those people are less effective. They might even be
idle.
</p>

<p>
Why would you treat the tools developers use as any less critical?
And yet, I see one company after another with unreliable, broken,
half-integrated QA environments. They've got bad data, unreliable
items, and manual test setup.
</p>

<p>
If the any stage of the development pipeline is broken, that's exactly
equivalent to the content pipeline being broken.
</p>

<p>
Development <b>is</b> production.
</p>

<p>
QA <b>is</b> production.
</p>

<p>
Your build pipeline <b>is</b> production.
</p>

<p>
Treat them accordingly!
</p>
]]></content></entry><entry><title>The Fear Cycle</title><link href="https://michaelnygard.com/blog/2015/07/the-fear-cycle/"/><id>https://michaelnygard.com/blog/2015/07/the-fear-cycle/</id><published>2015-07-15T07:11:38-05:00</published><updated>2015-07-15T07:11:38-05:00</updated><content type="html"><![CDATA[
<p>
Once you begin to fear your technology, you will shortly have cause to
fear it even more.
</p>

<p>
The Fear Cycle goes like this:
</p>

<ol class="org-ol">
<li>Small changes have unpredictable, scary, or costly results.
</li>
<li>We begin to fear making changes.
</li>
<li>We try to make every change as small and local as possible.
</li>
<li>The code base accumulates warts, knobs, and special cases.
</li>
<li>Fear intensifies.
</li>
</ol>

<p>
Fear starts when an innocuous change goes badly. Maybe a production
outage results, or maybe just an embarrassing bug. It may be a bug
that gets upper management attention. Nothing instills fear like an
executive committee meeting about <span class="underline">your</span> code defect!
</p>

<p>
This sphincter-shrinker originated because a developer couldn't
predict all the ramifications of a change. Maybe the test suite was
inadequate. Or there are special cases that are only observed in
production. (E.g., that one particular customer whose data setup is
different than everyone else.) Whatever the specific cause, the
general result is, "I didn't know that would happen."
</p>

<p>
Add a few of these events into the company lore and you'll find that
developers and project managers become loath to touch anything outside
their narrow scope. They seek local safety.
</p>

<p>
The trouble with local safety is that it requires kludges. The code
base will inevitably deteriorate as pressure for larger changes and
broader refactoring builds without release.
</p>

<p>
The vicious cycle is completed when one of those local kludges is
responsible for someone else's "What? I didn't know that!" moment. At
this point, the fear cycle is self-sustaining. The cost of even small
changes will continue to increase without limit. The time needed to
get changes released will increase as well.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Breaking Point</h2>
<div class="outline-text-2" id="text-1">
<p>
One of several things will happen:
</p>

<ol class="org-ol">
<li>A big bang rewrite (usually with a different team.) The focus will
be "this time, we do it <span class="underline">right</span>!" See also:
second system syndrome,
Things You Should Never Do, Part I.
</li>
<li>Large scale outsourcing.
</li>
<li>Sell off the damaged assets to another company.
</li>
</ol>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Avoiding the Cycle</h2>
<div class="outline-text-2" id="text-2">
<p>
The fear cycle starts when people treat a technical problem as a
personal one. The first time a seemingly simple change causes a large
and unpredictable effect, you need to convene a technical SWAT team to
determine why the system allowed it to happen and what technical
changes can avoid it in the future.
</p>

<p>
The worst response to a negative event is a tribunal.
</p>

<p>
Sadly, the difference between a technical SWAT team and a tribunal is
mostly in how the individuals in that group approach the issue. Wise
leadership is required to avoid the fear cycle. Look to people with
experience in operations or technical management.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Breaking the Cycle</h2>
<div class="outline-text-2" id="text-3">
<p>
Like many reinforcing loops in an organization, the fear cycle is
wickedly hard to break. So far, I have not observed any instance of a
company successfully breaking out of it. If you have, I would be very
interested to hear your experiences!
</p>
</div>
</div>
]]></content></entry><entry><title>Components and Glue</title><link href="https://michaelnygard.com/blog/2015/06/components-and-glue/"/><id>https://michaelnygard.com/blog/2015/06/components-and-glue/</id><published>2015-06-17T07:08:34-05:00</published><updated>2015-06-17T07:08:34-05:00</updated><content type="html"><![CDATA[
<p>
There's a well-known architectural style in desktop applications
called "Components and Glue". The central idea is that independent
components are composed together by a scripting layer. The glue is
often implemented in a different or more dynamic language than the
components.
</p>

<p>
The C2 wiki's page on <a href="http://c2.com/cgi/wiki?ComponentGlue">ComponentGlue</a> has been stable since 2004, so
obviously this is not a new idea.
</p>

<p>
Emacs is one example of this approach. The components are written in
C, the glue is ELisp. (To be fair, though, the ELisp outnumbers the C
by a pretty large factor.)
</p>

<p>
Perl was originally conceived as a glue language.
</p>

<p>
Visual Basic applications also followed this pattern. Components
written in C or C++, glue in VB itself.
</p>

<p>
I think Components and Glue is a relevant architecture style today,
especially if we want to compose and recompose our services in novel
ways.
</p>

<p>
My last several posts have been about decomposing services into
smaller, more independent units. Each one could be its own micro-SaaS
business. Some application needs to stitch these back together. I
often see this done in a separate layer that presents a simplified
interface to the applications.
</p>

<img src="/images/blog/2015-06-17-components-and-glue/common-architecture.png">
<div class="caption"></div>


<p>
This glue layer may be written in a different language than the services
themselves. For that matter, the individual services may be written in
a variety of languages, but that's a subject for a different time.
</p>

<p>
The glue layer changes more rapidly than the back end services,
because it needs to keep serving the applications as they change. Even
when the back end services are provided by an enterprise IT group, the
integration layer will be more affiliated with the front end web &amp; app
teams.
</p>

<p>
We embrace plurality, so if there's one glue layer, there may be
more. We should allow multiple glue layers, where each one is adapted
to the needs of its consumers. That begins to look like this:
</p>

<img src="/images/blog/2015-06-17-components-and-glue/multivalent.png">
<div class="caption"></div>


<p>
The smaller and lighter we make the glue, the faster we can adapt
it. The endpoint of that progression looks like
<a href="http://aws.amazon.com/lambda/">AWS Lambda</a> where every piece of script gets its own URL. Hit the URL
to invoke the script and it can hit services, reshape the results, and
reply in a client-specific format.
</p>

<p>
Once we reach that terminus, we can even think of individual functions
as having URLs. Like one-off scripts in ELisp or perl, we can write
glue for incidental needs: one-time marketing events, promotions,
trial integrations, and so on.
</p>

<p>
"Scripts as glue" also lets us deal with a tension that often arises
with valuable customers. Sometimes the biggest whales also demand a
lot of customization. How should we balance the need to customize our
service for large customers (the whales) and the need to generalize to
serve the entire market? We can create suites of scripts that present
one or more customer-specific interfaces, while the interior of our
services remain generalized.
</p>

<img src="/images/blog/2015-06-17-components-and-glue/multivalent-and-connected.png">
<div class="caption"></div>


<p>
This also allows us to handle one of the hardest cases: when a
customer wants us to "plug in" their own service in lieu of one of
ours. As I've said before, all our services use full URLs for
identifiers, so we should be able to point those URLs at our outbound
customer glue. That glue calls the customer service according to its
API and returns results according to our formats.
</p>

<p>
The components and glue pattern remains viable. As we decompose
monoliths, it is a great way to achieve separation between services
without undue burden on the front end applications and their
developers.
</p>
]]></content></entry><entry><title>Faceted Identities</title><link href="https://michaelnygard.com/blog/2015/06/faceted-identities/"/><id>https://michaelnygard.com/blog/2015/06/faceted-identities/</id><published>2015-06-12T06:48:23-05:00</published><updated>2015-06-12T06:48:23-05:00</updated><content type="html"><![CDATA[
<p>
I have a rich and multidimensional relationship with Amazon. It
started back in 1996 or 1997, when it became the main supplier for my
book addiction. As the years went by, I became an "Amazon Affiliate"
in a futile attempt to balance out my cash flow with the
company. Later, I started using AWS for cloud computing. I also
claimed my <a href="http://amzn.to/1BaQ0tm">author page</a>.
</p>

<p>
Let's contemplate the data architecture needed to maintain such a set
of relationships. Let's assume for the moment that Amazon were using a
SQL RDBMS to hold it all. The obvious approach is something I could
call the "Big Fat User Table". One table, keyed by my secret, internal
user ID, with columns for all the different possible thing a user can
be to Amazon. There would be a dozen columns for my affiliate status,
a couple for my author page, a boolean to show I've signed up for AWS,
and a bunch of booleans for each of the individual services.
</p>

<p>
Such a table would table would be an obvious bottleneck. Any DBA worth
her salt would split that sucker into many tables, joined by a common
key (the user ID.) New services would then just add a table in their
own database with the common user ID. Let's call this approach the
"Universal Identifier" design.
</p>

<p>
That would also allow one-to-many relations for some aspects. For
example, when I lived in Minnesota, the state demanded that Amazon
keep <a href="http://www.twincities.com/ci_23485656/amazon-ending-affiliate-relationships-avoid-minnesotas-online-sale">track of tax</a> for each affiliate. Amazon responded by shutting
down all the affiliate accounts in Minnesota. I recently moved to
Florida and was able to open a new account with my new address. So I
have two affiliate accounts attached to my user account.
</p>

<p>
For what it's worth, column family databases would kind of blur the
lines between the Big Fat User Table and the Universal Identifier
design.
</p>

<p>
We can get more flexible than the Universal Identifier, though.
</p>

<p>
You see, if we push the User ID into all the various services, that
implies that the "things" that service manages can <span class="underline">only</span> be consumed
by a User. Maneuverable architecture says we should be able to
recompose services in novel configurations to solve business
problems.
</p>

<p>
Instead of pushing the User ID into each service, we should just let
each service create IDs for its "things" and return them to us.
</p>

<p>
For example, a Calendar Service should be willing to create a new
calendar for anyone who asks. It doesn't need to know the ID of the
owner. Later, the owner can present the calendar ID as part of a
request (usually somewhere in the URL) to add events, check dates, or
delete the calendar. Likewise, a Ledger service should be willing to
create a new ledger for any consumer, to be used for any purpose. It
could be a user, a business, or a one-time special partnership. The
calls could be coming from a long-lived application, a bit of script
hooked to a URL, or <code>curl</code> in a bash script. Doesn't matter.
</p>

<p>
If we've got all these services issuing identifiers, we need some way
to stitch them back together. That's where the faceted identities come
in. If we start from a user and follow all the related "stuff"
connected to that user, it looks a lot like a graph.
</p>

<img src="/images/blog/2015-06-12-faceted-identities/user-graph.png">
<div class="caption"></div>


<p>
When a user logs in to the customer-facing application, that app is
responsible for traversing the graph of identities, making requests to
services, and assembling the response.
</p>

<p>
I hope you aren't surprised when I say that different applications may
hold different graphs, with different principals as their roots. That
goes along with the idea that there's no privileged vantage
point. Every application gets to act like the center of its own
universe.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Going Meta</h2>
<div class="outline-text-2" id="text-1">
<p>
If you've been schooled in database design, this probably looks a
little weird. I'm removing the join keys from the relational
databases. (Some day soon I need to write a post addressing a common
misconception: that "relational" databases got their name because they
let you relate tables together.)
</p>

<p>
The key issue I'm aiming at is really about logical dependencies in
the data. Foreign key relationships are a policy statement, not a law
of nature. Policies change on short notice, so they should be among
the most malleable constructs we have. By putting that policy in the
bottommost layer of every application, we make it as hard as possible
to change!
</p>

<p>
We can think of a hierarchy of "looseness" in relationships:
</p>

<ul class="org-ul">
<li>Two ideas, stored in one entity: As coupled as it gets. Neither idea
can be used without the other. (An "entity" here can be a table or
link data resources with URLs. It's not about the storage, but about
the required relationship.)
</li>
<li>Two ideas, two entities, one-to-one: Still, both ideas must be used
together.
</li>
<li>Two ideas, two entities, one-to-one optional: Now we can at least
decide whether the second item is needed with the first.
</li>
<li>Two ideas, two entities, one-to-many: This admits that the second
idea may come in different quantities than the first.
</li>
<li>Two ideas, two entities, many-to-many: Much more flexible! Both
ideas can be combined in differing quantities as needed. However,
this still requires that these ideas are only used together with
each other. In other words, if ideas X and Y have a many-to-many
relationship, I don't get to reuse idea X together with idea A.
</li>
<li>Two ideas, externalized relationship: This is the heart of faceted
identities. Ideas X and Y can be completely independent. Each can be
used together by other applications.
</li>
</ul>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Interface Segregation Principle</h2>
<div class="outline-text-2" id="text-2">
<p>
The "I" in SOLID stands for Interface Segregation Principle. It says
that a client should only depend on an interface with the minimum set
of methods it needs. An object may support a wide set of behavior, but
if my object only needs three of those behaviors, then I should depend
on an interface with precisely those three behaviors. (One hopes those
three make sense together!)
</p>

<p>
This has an application when we use faceted identies as
well. Sometimes we have a very nice separation where the facets don't
need to interact with each other, only the application interacts with
all of them. More often though, we do need to pass an identifier from
one kind of thing into another. That's when the contract becomes
important. If service Y requires a foreign identifier "X" to perform
an action, then it needs to be clear about what it will do with
"X". It's up to the calling application to ensure that the "X" it
passes can perform those actions.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Summary</h2>
<div class="outline-text-2" id="text-3">
<p>
Maneuverability is all about composing, recomposing, and combinging
services in novel configurations. One of the biggest impediments to
that is relationships among entities. We want to make those as loose
as possible by externalizing the relationships to another
service. This allows entities to be used in new ways without
coordinated change across services. Furthermore, it allows different
applications to use different relationship graphs for their own
purposes.
</p>
</div>
</div>
]]></content></entry><entry><title>Inverted Ownership, Part 2</title><link href="https://michaelnygard.com/blog/2015/05/inverted-ownership-part-2/"/><id>https://michaelnygard.com/blog/2015/05/inverted-ownership-part-2/</id><published>2015-05-26T06:22:39-05:00</published><updated>2015-05-26T06:22:39-05:00</updated><content type="html"><![CDATA[<p>
My last post on the subject of <a href="http://www.michaelnygard.com/blog/2015/05/inverted-ownership/">inverted ownership</a> felt a bit abstract,
so I thought I might illustrate it with a typical scenario.
</p>

<p>
In this first figure, we see a newly-extracted <code>Catalog</code> service,
freshly factored out of the old monolithic application. It's part of
the company's effort to become more maneuverable. We don't know, or
particularly care, what storage model it uses internally. From the
outside, it presents an interface that looks like "SKUs have
attributes".
</p>

<img src="/images/blog/2015-05-26-inverted-ownership-part-2/fig-1.png">
<div class="caption"></div>


<p>
All seems well. It looks and smells like a microservice: independently
deployable, released on its own schedule by a small autonomous team.
</p>

<p>
The problem is what you <b>don't</b> see in the picture: context. This
service has one "universe" of SKUs. It doesn't serve catalogs. It
serves <b>one</b> catalog. The problem becomes evident when we start asking
what consumers of this service would want. If we think of the online
storefront as the only consumer then it looks fine. Ask around a bit,
though, and you'll find other interested parties.
</p>

<img src="/images/blog/2015-05-26-inverted-ownership-part-2/fig-1-1.png">
<div class="caption"></div>


<p>
While IT toils to get down to a single source of record for product
information, the wheelers and dealers in the business are out there
signing up partners, inventing marketing campaigns, and looking into
new lines of business. Pretty much all of those are going to screw
around with the very idea of "the catalog".
</p>

<p>
Maneuverability demands that we can combine and recombine our services
in novel ways. What can we do with this catalog service that would let
it be reused in ways that the dev team didn't foresee?
</p>

<p>
Instancing might be one approach&#x2026; multiple deployments from the same
code base. High operational overhead, but it's better than being
stuck.
</p>

<p>
I prefer to make the context explicit instead.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Zero, One, Many</h2>
<div class="outline-text-2" id="text-1">
<p>
There's an old saying that the only sensible numbers are zero, one,
and infinity. One catalog isn't enough, so the right number to support
is "infinity." (Or some resource-constrained approximation.)
</p>

<p>
What does it take? All we have to do is make catalog service create
catalogs for anyone who asks. Any consumer that needs a catalog can
create one. That might be a big, sophisticated online storefront. But
it could be someone using cURL to manually construct a small catalog
for a one-off marketing effort. The catalog service shouldn't care who
wants the catalog or what purpose they are going to put it to.
</p>

<img src="/images/blog/2015-05-26-inverted-ownership-part-2/fig-3-1.png">
<div class="caption"></div>


<p>
Of course, this means that subsequent requests need to identify
<b>which</b> catalog the item comes from. Good thing we're already using
URLs as our identifiers.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Considerations</h2>
<div class="outline-text-2" id="text-2">
<p>
There are some practical issues (and maybe objections) to address.
</p>

<p>
First, does this mean that the SKUs are duplicated across all those
catalogs? Not necessarily. We're talking about the interface the
service presents to consumers. It can do all kinds of deduplication
internally. See my post about the <a href="http://thinkrelevance.com/blog/2013/07/08/shopping-in-infinite-space">immutable shopping cart</a> for some
ideas about deduplication and "natural" identifiers.
</p>

<p>
Second, and trickier, how do the SKUs get associated to the catalog?
Does each microsite and service need to populate its own catalog? Can
it just cherry-pick items from a "master" catalog?
</p>

<p>
You can probably guess that I don't much like the idea of a "master"
catalog. Instead, we would populate a newly-minted catalog by feeding
it either item representations (serialized data in a well-known
format) or better yet, hyperlinks that resolve to item
representations.
</p>

<p>
How about this: make the service support HTML, RDFa, and a
standardized microformat as a representation. Then you just feed your
catalog service with URLs that point to HTML. Those can come from a
catalog of your own, an internal app for cleansing data feeds, or even
a partner or vendor's web site. Now you've unified channel feeds, data
import, and catalog creation.
</p>

<p>
Third, is it really true that just anyone can create a catalog?
Doesn't this open us up to denial-of-service attacks wherein someone
could create billions of catalogs and goop up our database? My
response is that we don't ignore questions of authorization and
permission, but we do separate those concerns. We can use proxies at
trust boundaries to enforce permission and usage limits.
</p>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">Conclusion</h2>
<div class="outline-text-2" id="text-3">
<p>
When you make the context explicit, you allow a service to support an
arbitrary number of consumers. That includes consumers that don't
exist today and even ones you can't predict. Each service then becomes
a part that you can recombine in novel ways to meet future needs.
</p>
</div>
</div>
]]></content></entry><entry><title>Inverted Ownership</title><link href="https://michaelnygard.com/blog/2015/05/inverted-ownership/"/><id>https://michaelnygard.com/blog/2015/05/inverted-ownership/</id><published>2015-05-08T09:44:43-04:00</published><updated>2015-05-08T09:44:43-04:00</updated><content type="html"><![CDATA[<p>
One of the sources of <a href="http://www.michaelnygard.com/blog/2015/04/the-perils-of-semantic-coupling/">semantic coupling</a> has to do with identifiers,
and especially with synthetic identifiers. Most identifiers are just
alphanumeric strings. Systems share those identifiers to ensure that
both sides of an interface agree on the entity or entities they are
manipulating.
</p>

<p>
In the move to services, there is an unfortunate tendency to build in
a dependency on an ambient space of identifiers. This limits your
organization's maneuverability.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">Contextualized IDs</h2>
<div class="outline-text-2" id="text-1">
<p>
The trouble is that a naked identifier doesn't tell you what space of
identifiers it comes from. There is just this ambient knowledge that a
field called <code>Policy ID</code> is issued from the System of Record for
policies. That means there can only be one "space" of policy numbers,
and they must all be issued by the same SoR.
</p>

<p>
I don't believe in the idea of a single system of record. One of my
rules for architecture without an end state is "Embrace
Plurality". Whether through business changes or system migrations, you
will always end up with multiple systems of record for any concept or
entity.
</p>

<p>
In that world, it's important that IDs carry along their context. It
isn't enough to have an alphanumeric <code>Policy ID</code> field. You need a URN
or URI to identify <b>which</b> policy system issued that policy number.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">Liberal Issuance</h2>
<div class="outline-text-2" id="text-2">
<p>
Imagine a <code>Calendar</code> service that tracks events by date and time. It
would seem weird for that service to keep all events for every user in
the same calendar, right? We should really think of it as a
<code>Calendars</code> service. I'd expect to see API to create a calendar, which
returns the URL to "my" calendar. Every other API call then includes
that URL, either as a prefix or a parameter.
</p>

<p>
In the same way, your services should serve all callers and allow them
to create their own containers. If you're building a <code>Catalog</code>
service, think of it as a <code>Catalogs</code> service. Anybody can create a
catalog, for any purpose. Likewise, a <code>Ledger</code> service should really
be <code>Ledgers</code>. Any client can create a ledger for any reason.
</p>

<p>
This is the way to create services that can be recombined in novel
ways to create maneuverability.
</p>
</div>
</div>
]]></content></entry><entry><title>The Perils of Semantic Coupling</title><link href="https://michaelnygard.com/blog/2015/04/the-perils-of-semantic-coupling/"/><id>https://michaelnygard.com/blog/2015/04/the-perils-of-semantic-coupling/</id><published>2015-04-29T06:03:21-05:00</published><updated>2015-04-29T06:03:21-05:00</updated><content type="html"><![CDATA[<p>
On the subject of maneuverability, many organizations run into trouble
when they try to enter new lines of business, create a partnership, or
merge with another company. Updating enterprise systems becomes a
large cost factor in these business initiatives, sometimes large
enough to outweigh the benefits case. This is a terrible irony: our
automation provides efficiency, but removes flexibility.
</p>

<p>
If you break down the cost of such changes, you'll find it comes in
equal parts from changes to individual systesm and changes to
integrations across systems. Integrations are always costly and full
of risk, and never more so than we changing
cardinalities. Partnerships and mergers pretty much always change
cardinalities, too.
</p>

<p>
The cost factor arises from "semantic coupling." That is the coupling
between services introduced because the services need to share
concepts. It usually appears as data types or entity names that pop up
in many services.
</p>

<p>
As an example, let's think about a tiny retailing system with a small
set of what I'll call "macroservices". One of the most important
entity types here is the Stock Keeping Unit, or SKU. It represents "a
thing which can be sold". In a typical retail system, it has a large
number of attributes that describe how the item is priced, delivered,
displayed on the web, upsold and cross-sold, reviewed, categorized,
and taxed.
</p>

<p>
SKUs are created in a master data management system. There may be a
variety of feeds that get massaged into MDM, but we'll consider that
to be outside the boundary of our interest for now. From MDM, the SKU
must be distributed to a number of other services:
</p>

<img src="/images/blog/2015-04-29-the-perils-of-semantic-coupling/mdm-sku.png">
<div class="caption"></div>


<p>
Each of these macroservices uses aspects of the SKU for its own
purpose. Content management attaches "telling and selling" content to
the SKU so it can be presented nicely on the web. Pricing adds it to
the pricing rules. Shipping identifies the carriers, options, and
costs to deliver it. Order management&#x2013;probably a great big silver
beast of a system&#x2013;tracks inventory, orders, delivery rules, returns,
and a lot more.
</p>

<p>
Now what happens if we have to make a major change to the SKU? Let's
imagine that we want to change how we manage prices. In the past,
merchants set prices on each item individually. Now, we've got too
much in the catalog for that to scale so we introduce the idea of
price points for digital items. A price point is a price that applies
to a large number of SKUs. When we change the price point, all SKUs
that refer to it should be changed at the same time. So, if we decide
to reduce the price of a low-bitrate MP3 track from $0.99 to $0.89, we
can just change a single price point record.
</p>

<p>
How many systems do we have to change for this new concept?
</p>

<p>
If we consider "price point" to be part of our core domain, then we
have to add that concept everywhere. The surface area of that change
is really large, and it will be a costly change to make. It might even
be too costly to be worth doing. We could hire a small army of temp
workers to update price records by hand twice a year and still come
out ahead. That's not a very satisfying answer though. All this
automation is supposed to make us more efficient! What good is it if
we are stuck with outdated processes because our systems are too hard
to change?
</p>

<p>
The key problem is semantic coupling. There are a lot of systems here
that shouldn't need to care about the "price point" concept. It has no
bearing on the digital locker, shipping, or ratings &amp; reviews.
</p>

<p>
In this example, we can reduce the semantic coupling. Simply decide
that "price point" is not a core concept. It is a detail of data
management for the MDM system. Everything downstream receives SKUs
with a list price. No downstream system should care how that list
price was determined.
</p>

<p>
This decision flattens a many-to-one relationship from SKU to price
point. In so doing, we get a huge benefit. We eliminate an entire
entity and all references to it from all the downstream systems.
</p>

<p>
I would even make a case for shattering the concept of SKU into
multiple separate concepts. MDM may keep that concept. Downstream,
though, each system has its own set of internal concepts. We should
treat identifiers from other systems as opaque tokens that we map onto
our own system's space.
</p>

<img src="/images/blog/2015-04-29-the-perils-of-semantic-coupling/ids-as-function.png">
<div class="caption"></div>


<p>
For example, the pricing service doesn't need to know that it is
pricing SKUs. It just needs to price "things that can be priced." I
know, it sounds tautological, but I think we get misled as
humans&#x2026; we think of SKU as a unitary concept so we build it as such
in our systems. But look what happens if we say a pricing service can
price "stuff and things" as long as they have some mapping in the
pricing service itself. We can add an entirely new universe of things
to price, <span class="underline">without forcing everything on Earth to be a SKU</span>!
</p>

<p>
We should scrutinize each of the other services, asking ourselves,
"Does this <span class="underline">really</span> care about a SKU? Or does it care about something
that a SKU happens to posess?" I would argue that in each case, the
service really cares about "Thing that can be Xed". Priced, taxed,
shipped, reviewed, etc. Are SKUs the only things that can be taxed?
Are they the only things that can be reviewed? Etc.
</p>

<p>
Iterate this process and four things will happen:
</p>

<ol class="org-ol">
<li>Your services will shrink.
</li>
<li>Your services will become much more general.
</li>
<li>Each service will own its own space of identifiers.
</li>
<li>Your organization will become more maneuverable.
</li>
</ol>

<p>
The key point I want to make here is that a concept may appear to be
atomic just because we have a single word to cover it. Look hard
enough and you will find seams where you can fracture that
concept. Don't share the whole thing. Don't couple all your downstream
systems to the whole concept, and <span class="underline">definitely</span> don't couple your
downstream to a complex of related concepts! It's a cardinal sin.
</p>
]]></content></entry><entry><title>Maneuverability</title><link href="https://michaelnygard.com/blog/2015/04/maneuverability/"/><id>https://michaelnygard.com/blog/2015/04/maneuverability/</id><published>2015-04-23T12:13:21+02:00</published><updated>2015-04-23T12:13:21+02:00</updated><content type="html"><![CDATA[
<p>
Agile development works best at team scale. When a team can
self-organize, refine their methods, build the toolchain, and modify
adapt it to their needs, they will execute effectively. We should be
happy to achieve that! I worry when we try to force-fit the same
techniques at larger scales.
</p>

<p>
At the scale of a whole organization, we need to look at the qualities
we want to have. (We can't necessarily produce those qualities
directly, but we can create the conditions that allow them to emerge.)
When we look at attempts to scale agile development up, the quality
the org wants is <span class="underline">maneuverability</span>.
</p>

<p>
Maneuverability is the ability to change your vector rapidly. It's
about gaining, shedding, or redirecting momentum. Keeping with the
analogy of momentum, we can call that which resists change in the
momentum vector "inertial mass." Personnel are mass, because it's
relatively hard to add or shed personnel. Technical debt is a
component of mass, too. It makes changes to your technical strategy
harder. Actually, I'd even go so far as to say that code itself is
mass. KLOCs kill.
</p>

<p>
Maneuverability has been explored most fully by the military. Superior
maneuverability allows a fighter aircraft to get inside the enemy's
turn radius, then shoot for the kill. An army with high
maneuverability can engage, disengage, and reorient to exploit an
enemy's weakness. In the words of John Boyd, it allows you to separate
your opponent into multiple, non-cooperating centers of gravity.
</p>

<p>
Maneuverability is an emergent property. It requires a number of
prerequisites in the organization's structure, leadership style,
operations, and ability to execute. I firmly believe that
maneuverability requires a great ability to execute at the micro
scale.
</p>

<p>
Agile development provides that ability to execute in software
development. It is a necessary, but not sufficient, part of
maneuverability. There are other necessary capabilities in the
technical arena. I think that infrastructure and architecture have
important roles to play for maneuverability as well.
</p>

<p>
I have previously given talks on the subject of maneuverability. I'll
also be posting some further thoughts about pertinent architecture
decisions.
</p>
]]></content></entry><entry><title>Bad Layering</title><link href="https://michaelnygard.com/blog/2015/04/bad-layering/"/><id>https://michaelnygard.com/blog/2015/04/bad-layering/</id><published>2015-04-14T06:28:22-05:00</published><updated>2015-04-14T06:28:22-05:00</updated><content type="html"><![CDATA[<p>
If I had to guess, I would say that "Layers" is probably the most
commonly applied architecture pattern. And why not? Parfaits have
layers, and who doesn't like a parfait? So layers must be good.
</p>

<p>
Like everything else, though, there's a good way and a bad way.
</p>

<p>
The usual Neapolitan stack looks like this:
</p>
<img src="/images/blog/2015-04-14-bad-layering/typical-layers.png">
<div class="caption"></div>


<p>
On one of my favorite projects of all, we used more layers because we
wanted to further isolate different behaviors. In that project, we
added a "UI Model" distinct from the "Domain."
</p>

<img src="/images/blog/2015-04-14-bad-layering/favorite-project-layers.png">
<div class="caption"></div>


<p>
We impose this style because we want to separate concerns. This should
provide us with two big benefits. First, we can change the contents of
each layer independently. So changes to the GUI should not affect the
domain, and changes to the domain should not affect persistence. The
second benefit we want is the ability to substitute a layer. We may
swap out a layer for the sake of testing (often in the case of
persistence layers) or for different product configurations.
</p>

<p>
People sometimes make an argument for swapping out a layer in case of
technology change. That argument is used for ORMs in the persistence
layer, but I don't find it convincing. Changing persistence on an
existing application is by far <span class="underline">not</span> the most common kind of
change. You'd be buying an expensive option that is seldom exercised.
</p>

<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1">When Good Layers Go Bad</h2>
<div class="outline-text-2" id="text-1">
<p>
The trouble arises when the layers are built such that we have to
drill through several of them to do something common. Have you ever
checked in a commit that had a bunch of new files like "Foo",
"FooController", "FooForm", "FooFragment", "FooMapper", "FooDTO", and
so on? That, dear reader, is a breakdown in layering.
</p>

<p>
It comes from each layer being decomposed along the same dimension. In
this case, aligned by domain concept. That means the domain layer is
dominating the other layers.
</p>

<p>
I would much rather see each layer have objects and functions that
express the fundamental concepts of that layer. "Foo" is not a
persistence concept, but "Table" and "Row" are. "Form" is a GUI
concept, as is "Table" (but a different kind of table than the
persistence one!) The boundary between each layer should be a matter
of translating concepts.
</p>

<p>
In the UI, a domain object should be atomized into its constituent
attributes and constraints. In persistence, it should be atomized into
rows in one or more tables (in SQL-land) or one or more linked
documents.
</p>

<p>
What appears as a class in one layer should be mere data to every
other layer.
</p>
</div>
</div>

<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2">How Does It Happen?</h2>
<div class="outline-text-2" id="text-2">
<p>
This breakdown in layering can arise from more than one dynamic
process.
</p>

<ol class="org-ol">
<li>The application framework may impose this structure.
</li>

<li>The language may not have abstractions powerful enough to make it
pleasant to work with data.
</li>

<li>TDD without enough refactoring. Each thin slice through the
application adds one more strand of "Foo and Friends". Truly
merciless refactoring would pull out the common behavior sideways
into the layer-specific concepts I described above. Lacking
merciless refactoring, the project will accrete sticky strands like
cotton candy on a toddler.
</li>

<li>The team may not have seen it done any other way.
</li>
</ol>
</div>
</div>

<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3">What If It Happens To You?</h2>
<div class="outline-text-2" id="text-3">
<p>
Maybe you already have degenerate layers. Assuming they aren't
required by your framework, start looking for opportunities to
refactor. Don't just build a class hierarchy so you can inherit
implementations. Rather, look for common patterns of
interaction. Figure out how to turn the code you've got in classes
into data acted on by classes relevant to the layer.
</p>

<p>
Use maps. Convert objects into maps from field identifier to an object
that represents the salient aspect of the field for that layer:
</p>

<ul class="org-ul">
<li>For a GUI, those aspects will be something like "lexical type",
"editable", "constraint" / "validation", "semantic class", and so on.
</li>
<li>For persistence, they will deal with "length", "representation
format", "referent," etc.
</li>
</ul>

<p>
Seek and destroy DTOs. They should be maps.
</p>

<p>
A DTO clearly indicates that your class is crossing a boundary. And
yet, it requires that code on both sides of the boundary codes to the
method signatures of the DTO. That means there is precisely zero
translation at the boundary.
</p>
</div>
</div>

<div id="outline-container-sec-4" class="outline-2">
<h2 id="sec-4">Where To Go From Here</h2>
<div class="outline-text-2" id="text-4">
<p>
Let me be clear, I like parfaits. (Yogurt and fruit! Ice cream, nuts,
caramel!) I have nothing against layers. Most of my applications are
built from layers. It's just that getting the benefits we seek
requires more effort than smearing a single domain concept across
multiple subdirectories.
</p>

<p>
If "Layers" is the only architecture pattern you've used, then you're
in for a treat. There are plenty of other fundamental structures to
explore. Pipes and filters. Blackboard. Components. Set GoF aside and
go read <a href="http://www.amazon.com/Pattern-Oriented-Software-Architecture-Volume-Patterns/dp/0471958697">Pattern-Oriented Software Architecture.</a> The whole series is a
treasure trove and an encyclopedia.
</p>
</div>
</div>
]]></content></entry><entry><title>People Don't Belong to Organizations</title><link href="https://michaelnygard.com/blog/2015/04/people-dont-belong-to-organizations/"/><id>https://michaelnygard.com/blog/2015/04/people-dont-belong-to-organizations/</id><published>2015-04-11T15:45:57-05:00</published><updated>2015-04-11T15:45:57-05:00</updated><content type="html"><![CDATA[
<p>
One company that gets this right is <a href="http://www.github.com">Github</a>. I exist as my <a href="https://github.com/mtnygard">own person</a>
there. I'm affiliated with <a href="https://github.com/cognitect">my employer</a> as well as other organizations.
</p>

<p>
We are long past the days of "the company man," when a person's
identity was solely bound to their employer. That relationship is much
more fluid now.
</p>

<p>
A company that gets it wrong is <a href="https://www.atlassian.com/">Atlassian</a>. I've left behind a trail of
accounts in various Jirae and Confluences. Right now, the biggest
offender in their product lineup is <a href="https://www.atlassian.com/software/hipchat">HipChat</a>. My account is identified
by my email address, but it's bound up with an organization. If I want
to be part of my employer's HipChat as well as a client's, I have to
resort to multiple accounts signed up with plus addresses. It's great
that GMail supports that, but I still can't log in to more than one
account at a time.
</p>

<p>
More generally, this is a failure in modeling. Somewhere along the
line, somebody drew a line between `Organization` and `Person` on
their model, with a one-to-many relationship. One `Organization` can
have many `Person` entities, but each `Person` <span class="underline">belongs to</span> exactly
one `Organization`.
</p>

<p>
I'll go even further. The proper way to approach this today is to
relate `Organization` and `Person` by way of another entity. Reify the
association! Is it employment? Put the start and end dates on the
employment. Oh, and don't delete the association once it
ends&#x2026; that's erasing it from history.
</p>

<p>
I think the default for pretty much any relationship these days should
be many-to-many. Particularly any data relationship that models a real
relationship in the external world. We shouldn't let the bad old days
of SQL join tables deter us from doing the right thing now.
</p>
]]></content></entry><entry><title>Glue Fleet and Compojure Together Using Protocols</title><link href="https://michaelnygard.com/blog/2011/01/glue-fleet-and-compojure-together-using-protocols/"/><id>https://michaelnygard.com/blog/2011/01/glue-fleet-and-compojure-together-using-protocols/</id><published>2011-01-15T14:09:08-06:00</published><updated>2011-01-15T14:09:08-06:00</updated><content type="html"><![CDATA[<p>Inspired by <a href="http://www.vanderburg.org/">Glenn Vanderburg</a>'s article on <a href="http://steve.vinoski.net/blog/2010/10/05/glenn-vanderburg-on-fleet-and-enlive/">Clojure templating frameworks</a>, I decided to try using <a href="http://github.com/Flamefork/fleet">Fleet</a> for my latest pet project. Fleet has a very nice interface. I can call a single function to create new Clojure functions for every template in a directory. That really makes the templates feel like part of the language. Unfortunately, Glenn's otherwise excellent article didn't talk about how to connect Fleet into Compojure or Ring. I chose to interpret that as a compliment, springing from his high esteem of our abilities.</p>

<p>My first attempt, just calling the template function directly as a route handler resulted in the following:</p>

<pre>java.lang.IllegalArgumentException: No implementation of method: :render of protocol: #'compojure.response/Renderable found for class: fleet.util.CljString</pre>

<p>Ah, you've just got to love Clojure errors. After you understand the problem, you can always see that the error precisely described what was wrong. As an aid to helping you understand the problem... well, best not to dwell on that.</p>

<p>The clue is the protocol. Compojure knows how to turn many different things into valid response maps. It can <a href="http://github.com/weavejester/compojure/blob/ec835a1bb2fc1db961325336e1829e91139246e3/src/compojure/response.clj">handle</a> nil, strings, maps, functions, references, files, seqs, and input streams. Not bad for 22 lines of code!</p>

<p>There's probably a simpler way that I can't see right now, but I decided to have <a href="http://github.com/Flamefork/fleet/blob/87b32de67a52a4a49faa1de22254a1f503435624/src/fleet/util/CljString.java">CljString</a> support the same protocol.</p>

<script src="https://gist.github.com/781194.js?file=core.clj"></script>

<p>Take a close look at the call to <code>extend-protocol</code> on lines 12 through 15. I'm adding a protocol--which I didn't create--onto a Java class--which I also didn't create. My extension calls a function that was created at runtime, based on the template files in a directory. There's deep magic happening beneath those 3 lines of code.</p>

<p>Because I extended Renderable to cover CljString, I can use any template function directly as a route function, as in line 17. (The function <code>views/index</code> was created by the call to <code>fleet-ns</code> on line 10.)</p>

<p>So, I glued together two libraries without changing the code to either one, and without resorting to Factories, Strategies, or XML-configured injection.</p> 
]]></content></entry><entry><title>Metaphoric Problems in REST Systems</title><link href="https://michaelnygard.com/blog/2011/01/metaphoric-problems-in-rest-systems/"/><id>https://michaelnygard.com/blog/2011/01/metaphoric-problems-in-rest-systems/</id><published>2011-01-14T10:12:13-06:00</published><updated>2011-01-14T10:12:13-06:00</updated><content type="html"><![CDATA[<p>I used to think that metaphor was just a literary technique, that it was something you could use to dress up some piece of creative writing. Reading George Lakoff&rsquo;s <a href="https://www.amazon.com/gp/product/B001TI9FYE?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B001TI9FYE">Metaphors We Live By</a>, though has changed my mind about that.</p>
<p>I now see that metaphor is not just something we use in writing; it&rsquo;s actually a powerful technique for structuring thought. We use metaphor when we are creating designs. We say that a class is like a factory, that an object is a kind of a thing. The thing may be an animal, it may be a part of a whole, or it may be representative of some real world thing.</p>
<p>All those are uses of metaphor, but there is a deeper structure of metaphors that we use every day, without even realizing it. We don&rsquo;t think of them as metaphors because in a sense these are actually the ways that we think. Lakoff uses the example of &ldquo;The tree is in front of the mountain.&rdquo; Perfectly ordinary sentence. We wouldn&rsquo;t think twice about saying it.</p>
<p>But the mountain doesn&rsquo;t actually have a front, neither does the tree. Or if the mountain has a front, how do we know it&rsquo;s facing us? What we actually mean, if we unpack that metaphor is something like, &ldquo;The distance from me to the tree is less than the distance from me to the mountain.&rdquo; Or, &ldquo;The tree is closer to me than the mountain is.&rdquo; That we assign that to being in front is actually a metaphoric construct.</p>
<p>When we say, &ldquo;I am filled with joy.&rdquo; We are actually using a double metaphor, two different metaphors related structurally. One, is &ldquo;A Person Is A Container,&rdquo; the other is, &ldquo;An Emotion Is A Physical Quantity.&rdquo; Together it makes sense to say, if a person is a container and emotion is a physical thing then the person can be full of that emotion. In reality of course, the person is no such thing. The person is full of all the usual things a person is full of, tissues, blood, bones, other fluids that are best kept on the inside.</p>
<p>But we are embodied beings, we have an inside and an outside and so we think of ourselves as a container with something on the inside.</p>
<p>This notion of containers is actually really important.</p>
<p>Because we are embodied beings, we tend to view other things as containers as well. It would make perfect sense to you if I said, &ldquo;I am in the room.&rdquo; The room is a container, the building is a container. The building contains the room. The room contains me. No problem.</p>
<p>It would also make perfect sense to you, if I said, &ldquo;That program is in my computer.&rdquo; Or we might even say, &ldquo;that video is on the Internet.&rdquo; As though the Internet itself were a container rather than a vast collection of wires and specialized computers.</p>
<p>None of these things are containers, but it&rsquo;s useful for us to think of them as such. Metaphorically, we can treat them as containers. This isn&rsquo;t just an abstraction about the choice of pronouns. Rather the use of the pronouns I think reflects the way that we think about these things.</p>
<p>We also tend to think about our applications as containers. The contents that they hold are the features they provide. This has provided a powerful way of thinking about and structuring our programs for a long time. In reality, no such thing is happening. The program source text doesn&rsquo;t contain features. It contains instructions to the computer. The features are actually sort of emergent properties of the source text.</p>
<p>Increasingly the features aren&rsquo;t even fully specified within the source text. We went through a period for a while where we could pretend that everything was inside of an application. Take web systems for example. We would pretend that the source text specified the program completely. We even talked about application containers. There was always a little bit of fuzziness around the edges. Sure, most of the behavior was inside the container. But there were always those extra bits. There was the web server, which would have some variety of rules in it about access control, rewrite rules, ways to present friendly URLs. There were load balancers and firewalls. These active components meant that it was really necessary to understand more than the program text, in order to fully understand what the program was doing.</p>
<p>The more the network devices edged into Layer 7, previously the domain of the application, the more false the metaphor of program as container became. Look at something like a web application firewall. Or the miniature programs you can write inside of an F5 load balancer. These are functional behavior. They are part of the program. However, you will never find them in the source text. And most of the time, you don&rsquo;t find them inside the source control systems either.</p>
<p>Consequently, systems today are enormously complex. It&rsquo;s very hard to tell what a system is going to do once you put into production. Especially in those edge cases within hard to reach sections of the state space. We are just bad at thinking about emergent properties. It&rsquo;s hard to design properties to emerge from simple rules.</p>
<p>I think we&rsquo;ll find this most truly in RESTful architectures. In a fully mature REST architecture, the state of the system doesn&rsquo;t really exist in either the client or the server, but rather in the communication between the two of them. We say, HATEOAS &ldquo;Hypertext As The Engine Of Application State,&rdquo; (which is a sort of shibboleth use to identify true RESTafarian&rsquo;s from the rest of the world) but the truth is: what the client is allowed to do is to hold to it by the server at any point in time, and the next state transition is whatever the client chooses to invoke. Once we have that then the true behavior of the system can&rsquo;t actually be known just by the service provider.</p>
<p>In a REST architecture we follow an open world assumption. When we&rsquo;re designing the service provider, we don&rsquo;t actually know who all the consumers are going to be or what their individual and particular work flows maybe. Therefore we have to design for a visible system, an open system that communicates what it can do, and what it has done at any point in time. Once we do that then the behavior is no longer just in the server. And in a sense it&rsquo;s not really in the client either. It&rsquo;s in the interaction between the two of them, in the collaborations.</p>
<p>That means the features of our system are emergent properties of the communication between these several parts. They&rsquo;re externalized. They&rsquo;re no longer in anything. There is no container. One could almost say there&rsquo;s no application. The features exists somewhere in the white space between those boxes on the architecture diagram.</p>
<p>I think we lack some of the conceptual tools for that as well. We certainly don&rsquo;t have a good metaphorical structure for thinking about behavior as a hive-like property emerging from the collaboration of these relatively, independent and self-directed pieces of software.</p>
<p>I don&rsquo;t know where the next set of metaphors will come from. I do know that the attempt to force web-shaped systems in to the application is container metaphor, simply won&rsquo;t work anymore. In truth, they never worked all that well. But now it&rsquo;s broken down completely.</p>
]]></content></entry><entry><title>Metaphoric Problems in REST Systems (audio)</title><link href="https://michaelnygard.com/blog/2011/01/metaphoric-problems-in-rest-systems-audio/"/><id>https://michaelnygard.com/blog/2011/01/metaphoric-problems-in-rest-systems-audio/</id><published>2011-01-14T10:02:52-06:00</published><updated>2011-01-14T10:02:52-06:00</updated><content type="html"><![CDATA[<object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F9134712&amp;show_comments=true&amp;auto_play=false&amp;color=630420"></param> <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F9134712&amp;show_comments=true&amp;auto_play=false&amp;color=630420" type="application/x-shockwave-flash" width="100%"></embed> </object>   <span><a href="http://soundcloud.com/mtnygard/metaphoric-problems-in-rest-systems">Metaphoric problems in rest systems</a> by mtnygard</span> 
]]></content></entry><entry><title>Time motivates architecture</title><link href="https://michaelnygard.com/blog/2010/04/time-motivates-architecture/"/><id>https://michaelnygard.com/blog/2010/04/time-motivates-architecture/</id><published>2010-04-21T08:00:00-05:00</published><updated>2010-04-21T08:00:00-05:00</updated><content type="html"><![CDATA[<p>Let&rsquo;s engage in a thought experiment for a moment. Suppose that software was trivial to create and only ever needed to be used once. Completely disposable. So, somebody comes to you and says, &ldquo;I have a problem and I need you to solve it. I need a tool that will do blah-de-blah for a little while.&rdquo; You could think of the software the way that a carpenter thinks of a jig for cutting a piece of wood on a table saw, or a metalworker thinks of creating a jig to drill a hole at the right angle and depth.</p>
<p>If software were like this, you would never care about its architecture. You would spend a few minutes to create the thing that was needed, it would be used for the job at hand, and then it would be thrown away. It really wouldn&rsquo;t matter how good the software was on the inside&ndash;how easy it was to change&ndash;because you&rsquo;d never change it! It wouldn&rsquo;t matter how it adapted to changing business requirements, because you&rsquo;d just create a new one when the new requirement came up. In this thought experiment we wouldn&rsquo;t worry about architecture.</p>
<p>The key difference between this thought experiment and actual software? Of course, actual software is <em>not</em> disposable. It has a lifespan over some amount of time. Really, it&rsquo;s the time dimension that makes architecture important.</p>
<p>Over time, we need for many different people to work effectively in the software. Over time, we need the throughput of features to stay constant, or hopefully not decrease too much. Maybe it even increases in particularly nice cases. Over time, the business needs change so we need to adapt the software.</p>
<p>It&rsquo;s really time that makes us care about architecture.</p>
<p>Isn&rsquo;t it interesting then, that we never include time as a dimension in our architecture descriptions?</p>
]]></content></entry><entry><title>Circuit Breaker in Scala</title><link href="https://michaelnygard.com/blog/2010/04/circuit-breaker-in-scala/"/><id>https://michaelnygard.com/blog/2010/04/circuit-breaker-in-scala/</id><published>2010-04-21T06:00:00-05:00</published><updated>2010-04-21T06:00:00-05:00</updated><content type="html"><![CDATA[<p>FaKod (I think that translates as "The Fatalistic Coder"?) has written a nice Scala implementation of the Circuit Breaker pattern, and even better, has made it available on GitHub.</p>

<p>Check out <a href="http://github.com/FaKod/Circuit-Breaker-for-Scala">http://github.com/FaKod/Circuit-Breaker-for-Scala</a> for the code.</p>

<p>The Circuit Breaker can be mixed in to any type. See http://wiki.github.com/FaKod/Circuit-Breaker-for-Scala/ for an example of usage.</p>
 
]]></content></entry><entry><title>The Future of Software Development</title><link href="https://michaelnygard.com/blog/2010/04/the-future-of-software-development/"/><id>https://michaelnygard.com/blog/2010/04/the-future-of-software-development/</id><published>2010-04-20T06:00:00-05:00</published><updated>2010-04-20T06:00:00-05:00</updated><content type="html"><![CDATA[<p>I&rsquo;ve been asked to sit on a panel regarding the future of software
development. This is always risky and makes me nervous, for two reasons. First, prediction is a notoriously low success-rate activity. Second, the people you always see making predictions like this are usually well past their &ldquo;use by&rdquo; date. Nevertheless, here are a collection of barely-related thoughts I have on that subject.</p>
<ul>
<li>
<p>Two obvious trends are cloud computing and mobile access. They are
complementary. As the number of people and devices on the net
increases, our ability to shape traffic on the demand side gets
worse. Spikes in demand will happen faster and reach higher levels
over time. Mobile devices exacerbate the demand side problems by
greatly increasing both the number of people on the net and the
fraction of their time they are able to access it.</p>
</li>
<li>
<p>Large traffic volumes both create and demand large data. Our tools
for processing tera- and petabyte datasets will improve
dramatically. Map/Reduce computing (a la Hadoop) has created
attention and excitement in this space, but it is ultimately just
one tool among many. We need better languages to help us think and
express large data problems. In particular, we need a language that
makes big data processing accessible to people with little
background in statistics or algorithms.</p>
</li>
<li>
<p>Speaking of languages, many of the problems we face today cannot be
solved inside a single language or application. The behavior of a
web site today cannot be adequately explained or reasoned about just
by examining the application code. Instead, a site picks up
attributes of behavior from a multitude of sources: application
code, web server configuration, edge caching servers, data grid
servers, offline or asynchronous processing, machine learning
elements, active network devices (such as application firewalls),
and data stores. &ldquo;Programming&rdquo; as we would describe it today&ndash;coding
application behavior in a request handler&ndash;defines a diminishing
portion of the behavior. We lack tools or languages to express and
reason about these distributed, extended, fragmented
systems. Consequently, it is difficult to predict the functionality,
performance, capacity, scalability, and availability of these
systems.</p>
</li>
<li>
<p>Some of this will be mitigated naturally as application-specific
functions disappear into tools and frameworks. Companies innovating
at the leading edge of scalability today are doing things in
application-specific behavior to compensate for deficiencies in
tools and platforms. For example, caching servers could arguably
disappear into storage engines and no-one would complain. In other
words, don&rsquo;t count the database vendors out yet. You&rsquo;ll see
key-value stores and in-memory data grid features popping up in
relational databases any day now.</p>
</li>
<li>
<p>In general, it appears that Objects will diminish as a programming
paradigm. Object-oriented programming will still exist&hellip; I&rsquo;m not
claiming &ldquo;the death of objects&rdquo; or something silly like
that. However, OO will become just one more paradigm among several,
rather than the dominant paradigm it has been for the last 15
years. &ldquo;Object oriented&rdquo; will no longer be synonymous with
&ldquo;good&rdquo;.</p>
</li>
<li>
<p>Some people have talked about &ldquo;polyglot programming&rdquo;. I think this
is a red herring. Polylgot is a reality, but it should not be a
goal. That is, programmers should know many languages and paradigms,
but deliberately mixing languages in a single application should be
avoided. What I think we will find instead is mixing of paradigms,
supported by a single primary language, with adjunct languages used
only as needed for specialized functions. For example, an
application written in Scala may mix OO, functional, and actor-based
concepts, and it may have portions of behavior expressed in SQL and
Javascript. Nevertheless, it will still primarily be a Scala
application. The fact that Groovy, Scala, Clojure, and Java all run
on Java Virtual Machine shouldn&rsquo;t mislead us into thinking that they
are interchangeable&hellip; or even interoperable!</p>
</li>
<li>
<p>Regarding Java. I fear that Java will have to be abandoned to the
&ldquo;Enterprise Development&rdquo; world. It will be relegated to the hands of
cut-rate business coders bashing out their gray business
applications for $30 / hour. We&rsquo;ve passed the tipping point on this
one. We used to joke that Java would be the next COBOL, but that
doesn&rsquo;t seem as funny now that it&rsquo;s true. Java will continue to
exist. Millions of lines of it will be written each year. It won&rsquo;t
be the driver of innovation, though. As individual programmers, I&rsquo;d
recommend that you learn another language immediately and
differentiate yourself from the hordes of low-skill, low-rent
outsource coders that will service the mainstream Java consumer.</p>
</li>
<li>
<p>Where will innovation come from? Although some of the blush seems to
be coming off Ruby, the reduction in hype has mainly allowed Ruby
and Ruby on Rails developers to knuckle down and <em>produce</em>. That
community continues to drive tremendous innovation. Many of the
interesting developments here relate to process. Ruby developers
have given us fantastic tools like Gems and Capistrano, that let
small teams outperform and outproduce groups four times their size.</p>
</li>
<li>
<p>To my great surprise, data storage has become a hotbed of innovation
in the last few years. Some of this is driven by the
high-scalability fetishists, which is probably the wrong reason for
98% of companies and teams. However, innovations around column
stores, graph databases, and key-value stores offer developers new
tools to reduce the impedance mismatch between their data storage
and their programming language. We spent twenty years trying to
squeeze objects into relational databases. Aside from the object
databases, which were an early casualty of Oracle&rsquo;s ascension, we
mostly focused on changing the application code through framework
after framework and ORM after ORM. It&rsquo;s refreshing to see storage
models that are easier to use and easier to modify.</p>
</li>
<li>
<p>This will also cause another flurry of &ldquo;reactive innovation&rdquo; from
the database vendors, just as we saw with &ldquo;Universal Databases&rdquo; in
the mid-90s. The big players here&ndash;Microsoft and Oracle&ndash;won&rsquo;t let
some schemaless little upstarts erode their market share. More
significantly, they aren&rsquo;t about to let their flagship products&ndash;and
the ones which give them beachheads inside every major
corporation&ndash;get intermediated by some open-source frameworks banged
up by the social network giants. Look for big moves by these vendors
into high scalability, agile storage, and eventual consistency
storage.</p>
</li>
</ul>
]]></content></entry><entry><title>Failover: Messy Realities</title><link href="https://michaelnygard.com/blog/2010/04/failover-messy-realities/"/><id>https://michaelnygard.com/blog/2010/04/failover-messy-realities/</id><published>2010-04-19T06:00:00-05:00</published><updated>2010-04-19T06:00:00-05:00</updated><content type="html"><![CDATA[		<p>
			People who don't live in operations can carry some funny misconceptions in their heads. Some of my personal faves:
		</p>
		<ul>
			<li>Just add some servers!
			</li>
			<li>I want a report of every configuration setting that's different between production and QA!
			</li>
			<li>We're going to make sure this (outage) never happens again!
			</li>
		</ul>
		<p>
			I've recently been reminded of this during some discussions about disaster recovery. This topic seems to breed misconceptions. Somewhere, I think most people carry around a mental model of failover that looks like this:
		</p>
		<p>
			<a href="/images/blog/failover/mental_model.png" target="_blank"><img src="/images/blog/failover/mental_model.png" style="border: none; float: none;" alt="Normal operations transitions directly and cleanly to failed over" width="297" height="345"></a>
		</p>
		<p>
			That is, failover is essentially automatic and magical.
		</p>
		<p>
			Sadly, there are many intermediate states that aren't found in this mental model. For example, there can be quite some time between failure and it's detection. Depending on the detection and notification, there can be quite a delay before failover is initiated at all. (I once spoke with a retailer whose primary notification mechanism seemed to be the Marketing VP's wife.)
		</p>
		<p>
			Once you account for delays, you also have to account for faulty mechanisms. Failover itself often fails, usually due to configuration drift. Regular drills and failover exercises are the <em>only</em> way to ensure that failover works when you need it. When the failover mechanisms themselves fail, your system gets thrown into one of these terminal states that require manual recovery.
		</p>
		<p>
			Just off the cuff, I think the full model looks a lot more like this:
		</p>
		<p>
			<a href="/images/blog/failover/real_states.png" target="_blank"><img src="/images/blog/failover/real_states.png" style="border: none; float: none;" alt="Many more states exist in the real world, including failure of the failover mechanism itself." width="597" height="696"></a>
		</p>
		<p>
			It's worth considering each of these states and asking yourself the following questions:
		</p>
		<ul>
			<li>Is the state transition triggered automatically or manually?
			</li>
			<li>Is the transition step executed by hand or through automation?
			</li>
			<li>How long will the state transition take?
			</li>
			<li>How can I tell whether it worked or not?
			</li>
			<li>How can I recover if it didn't work?
			</li>
		</ul>
 
]]></content></entry><entry><title>Life's Little Frustrations</title><link href="https://michaelnygard.com/blog/2010/04/lifes-little-frustrations/"/><id>https://michaelnygard.com/blog/2010/04/lifes-little-frustrations/</id><published>2010-04-18T18:24:40-05:00</published><updated>2010-04-18T18:24:40-05:00</updated><content type="html"><![CDATA[<blockquote>A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. -Leslie Lamport</blockquote>

<p>
On my way to QCon Tokyo and <a href="http://www.qconbeijing.com/">QCon China</a>, I had some time to kill so I
headed over to Delta's Skyclub lounge. I've been a member for a few
years now. And why not? I mean, who could pass up tepid coffee, stale
party snacks, and a TV permanently locked to CNN?  Wait... that actually
doesn't sound like such a hot deal.
</p>

<p>
Oh! I remember, it's for the wifi access. (Well, that plus reliably
clean bathrooms, but we need not discuss that.) Being able to count on
wifi access without paying for yet another data plan has been pretty
helpful for me. (As an aside, I might change my tune once I try a <a href="http://www.verizonwireless.com/b2c/mobilebroadband/?page=products_mifi">mifi</a> box. Carrying my own hotspot sounds even better.)
</p>

<p>
Like most wifi providers, the Skyclub has a captive portal. Before you
can get a TCP/IP connection to anything, you have to submit a form
with a checkbox to agree to 89 pages of terms and conditions. I'm well
aware that Delta's lawyers are trying to make sure the company isn't
liable if I go downloading bootlegs of every <a href="http://en.wikipedia.org/wiki/List_of_Ally_McBeal_episodes">Ally McBeal episode</a>. But
I really don't know if these agreements are enforceable. For all I
know, page 83 has me agreeing to 7 years indentured servitude cleaning
Delta's toilets.
</p>

<p>
Anyway, Delta has outsourced operations of their wifi network to
Concourse Communications. And apparently, they've had an outage all
morning that has blocked anyone from using wifi in the Minneapolis
Skyclubs. When I submit the form with the checkbox, I get the
following error page:
</p>

<a target="_blank" href="/images/blog/skyclub/skyclub_error_1.png">
<img width="622" height="211" style="border: none; float: none;" src="/images/blog/skyclub/skyclub_error_1.png" /></a>

<p>
Including this bit of stacktrace:
</p>

<a target="_blank" href="/images/blog/skyclub/skyclub_error_2.png">
<img width="537" height="58" border="0" style="border: none; float: none;" src="/images/blog/skyclub/skyclub_error_2.png"/></a>

<p>
There's a lot to dislike here.
</p>

<ol>
<li>
Why is this yelling at me, the user? To anyone who <i>isn't</i> a web
site developer, this makes it sound like the user did something
wrong. There's a ton of scary language here: &quot;instance-specific
error&quot;, &quot;allow remote connections&quot;, &quot;Named Pipes Provider&quot;... heck,
this sounds like it's accusing the user of hacking servers.  &quot;Stack
trace&quot; sure sounds like the Feds are hot on somebody's trail, doesn't
it?
</li>

<li>
Isn't it fabulous to know that Ken keeps his projects on his D:
drive? If I had to lay bets, I'd say that Ken screwed up his
configuration string. In fact, the whole problem smells like a failed
deployment or poorly executed change. Ken probably pushed some code
out late on a Friday afternoon, then boogied out of town. My
prediction (totally unverifiable, of course) is that this problem will
take less than 5 minutes to resolve, once Ken gets his ass back from
the beach.
</li>

<li>
We mere users get to see quite a bit of internal
information here. Nothing really damaging, unless of course <a href="http://www.ormapper.net/">Wilson
ORMapper</a> has some security defects or something like that.
</li>

<li>
Stepping back from this specific error message, we have the larger
question: is it sensible to couple availability of the network to the
availability of this check-the-box application? Accessing the network
is the primary purpose of this whole system. It is the most critical
feature. Is collecting a compulsory boolean &quot;true&quot; from every user
really as important as the reason the whole damn thing was built in
the first place? Of course not!  (As an aside, this is an example of <a href="http://en.wikipedia.org/wiki/Systemantics">Le Chatelier's Principle</a>: &quot;Complex systems tend to oppose their own proper function.&quot;)
</li>
</ol>

<p>
We see this kind of operational coupling all the time. Non-critical
features are allowed to damage or destroy critical features. Maybe
there's a single thread pool that services all kinds of requests,
rather than reserving a separate pool for the important things. Maybe
a process is overly linearized and doesn't allow for secondary,
after-the-fact processing. Or, maybe a critical and a non-critical
system both share an enterprise service---producing a common-mode
dependency.
</p>

<p>
Whatever the proximate cause, the underlying problem is
lack of diligence in operational decoupling.
</p> 
]]></content></entry><entry><title>Topics in Architecture</title><link href="https://michaelnygard.com/blog/2010/01/topics-in-architecture/"/><id>https://michaelnygard.com/blog/2010/01/topics-in-architecture/</id><published>2010-01-03T17:16:06-06:00</published><updated>2010-01-03T17:16:06-06:00</updated><content type="html"><![CDATA[<p>I&rsquo;m working on a syllabus for an extensive course on web architecture.  This will be for experienced programmers looking to become architects.</p>
<p>Like all of my work about architecture, this covers technology, business, and strategic aspects, so there&rsquo;s an emphasis on creating high-velocity, competitive organizations.</p>
<p>In general, I&rsquo;m aiming for a mark that&rsquo;s just behind the bleeding edge.  So, I&rsquo;m including several of the NoSQL persistence technologies, for example, but not including Erjang because it&rsquo;s too early.  (Or is that &ldquo;erl-y&rdquo;? )</p>
<p>(What I&rsquo;d really love to do is make a screencast series out of all of these.  I&rsquo;m daunted, though.  There&rsquo;s a lot of ground to cover here!)</p>
<p>EDIT: Added function and OO styles of programming. (Thanks <a href="http://www.twitter.com/deanwampler">@deanwampler</a>.)  Added JRuby/Java under languages.  (Thanks <a href="http://www.twitter.com/glv">@glv</a>.)</p>
<p>I&rsquo;m interested in hearing your feedback.  What would you add? Remove?</p>
<ul>
<li>
<p>Methods and Processes</p>
<ul>
<li>Systems Thinking/Learning Organization</li>
<li>High Velocity Organizations</li>
<li>Safety Culture</li>
<li>Error-Inducing Systems (&ldquo;Normal Accidents&rdquo;)</li>
<li>Points of Leverage</li>
<li>Fundamental Dynamics: Iteration, Variation, Selection, Feedback, Constraint</li>
<li>5D architecture</li>
<li>Failures of Intuition</li>
<li>ToC</li>
<li>Critical Chain</li>
<li>Lean Software Development</li>
<li>Real Options</li>
<li>Strategic Navigation</li>
<li>OODA</li>
<li>Tempo, Adaptation</li>
<li>XP</li>
<li>Scrum</li>
<li>Lean</li>
<li>Kanban</li>
<li>TDD</li>
</ul>
</li>
<li>
<p>Architecture Styles</p>
<ul>
<li>REST / ROA</li>
<li>SOA</li>
<li>Pipes &amp; Filters</li>
<li>Actors</li>
<li>App-server centric</li>
<li>Event-Driven Architecture</li>
</ul>
</li>
<li>
<p>Web Foundations</p>
<ul>
<li>The &ldquo;architecture&rdquo; of the web</li>
<li>HTTP 1.0 &amp; 1.1</li>
<li>Browser fetch behaviors</li>
<li>HTTP Intermediaries</li>
</ul>
</li>
<li>
<p>The Nature of the Web</p>
<ul>
<li>Crowdsourcing</li>
<li>Folksonomy</li>
<li>Mashups/APIs/Linked Open Data</li>
</ul>
</li>
<li>
<p>Testing</p>
<ul>
<li>TDD</li>
<li>Unit testing</li>
<li>BDD/Spec testing</li>
<li>ScalaCheck</li>
<li>Selenium</li>
</ul>
</li>
<li>
<p>Persistence</p>
<ul>
<li>Redis</li>
<li>CouchDB</li>
<li>Neo4J</li>
<li>eXist</li>
<li>&ldquo;Web-shaped&rdquo; persistence</li>
</ul>
</li>
<li>
<p>Technical architecture</p>
<ul>
<li>8 Fallacies of Distributed Computing</li>
<li>CAP Theorem</li>
<li>Scalability</li>
<li>Reliability</li>
<li>Performance</li>
<li>Latency</li>
<li>Capacity</li>
<li>Decoupling</li>
<li>Safety</li>
</ul>
</li>
<li>
<p>Languages and Frameworks</p>
<ul>
<li>Spring</li>
<li>Groovy/Grails</li>
<li>Scala
<ul>
<li>Lift</li>
</ul>
</li>
<li>Clojure
<ul>
<li>Compojure</li>
</ul>
</li>
<li>JRuby
<ul>
<li>Rails</li>
</ul>
</li>
<li>OSGi</li>
</ul>
</li>
<li>
<p>Design</p>
<ul>
<li>Code Smells</li>
<li>Object Thinking</li>
<li>Object Design</li>
<li>Functional Thinking</li>
<li>API Design</li>
<li>Design for Operations</li>
<li>Information Hiding</li>
<li>Recognizing Coupling</li>
</ul>
</li>
<li>
<p>Deployment</p>
<ul>
<li>Physical</li>
<li>Virtual</li>
<li>Multisite</li>
<li>Cloud (AWS)</li>
<li>Chef</li>
<li>Puppet</li>
<li>Capistrano</li>
</ul>
</li>
<li>
<p>Build and Version Control</p>
<ul>
<li>Git</li>
<li>Ant</li>
<li>Maven</li>
<li>Leiningen</li>
<li>Private repos</li>
<li>Collaboration across projects</li>
</ul>
</li>
</ul>
]]></content></entry><entry><title>"If the last one goes, we'll be up here all night!"</title><link href="https://michaelnygard.com/blog/2009/12/if-the-last-one-goes-well-be-up-here-all-night/"/><id>https://michaelnygard.com/blog/2009/12/if-the-last-one-goes-well-be-up-here-all-night/</id><published>2009-12-18T15:54:21-06:00</published><updated>2009-12-18T15:54:21-06:00</updated><content type="html"><![CDATA[<p>There&rsquo;s an old joke about a couple of folks on a plane who hear the captain successively announce that they&rsquo;ve lost one, two, then three engines. Each time, he reassures the passengers that they&rsquo;re OK, but will be progressively later to land.  After the losing the third engine, one passenger tells the other, &ldquo;If the last one goes, we&rsquo;ll be up here all night!&rdquo;</p>
<p>It&rsquo;s a remarkable aircraft that can fly on just one out of four engines. Most four engine jets need at least two to cruise.  (I&rsquo;ve been told that they can make a controlled descent on one engine, but can&rsquo;t maintain altitude.)</p>
<p>Likewise, your web app probably needs more than just one functioning server to handle demand. The usual approach to computing availability is to compute the odds that at least one server survives:</p>
<img alt="rel_3-1a.png" src="/images/blog/reliability/rel_3-1a.png" width="170" height="23" />
<p>If all the servers are identical, meaning that we expect them to have the same failure rate, then this reduces to the more familiar form:</p>
<img alt="rel_3-1b.png" src="/images/blog/reliability/rel_3-1b.png" width="148" height="19" />
]]></content></entry><entry><title>Coupling and Coevolution</title><link href="https://michaelnygard.com/blog/2009/12/coupling-and-coevolution/"/><id>https://michaelnygard.com/blog/2009/12/coupling-and-coevolution/</id><published>2009-12-03T11:12:03-06:00</published><updated>2009-12-03T11:12:03-06:00</updated><content type="html"><![CDATA[<p>The mighty Mississippi River starts in Minnesota, at Lake Itasca. Every kid in Minnesota has to make the ritual pilgrimage to <a href="http://www.dnr.state.mn.us/state_parks/itasca/index.html">Itasca State Park</a> at some point, where wading across North America&rsquo;s longest river is a rite of passage.</p>
<p><img src="https://farm4.static.flickr.com/3199/2369068347_2de8466d30_m.jpg" alt="Mississippi River Starts Here"></p>
<p>One of the very interesting things in Itasca State Park is a section of forest that is fenced off so that deer cannot enter it. It&rsquo;s part of a decades-long experiment to see how forests are affected by browsing herbivores.  What&rsquo;s really interesting is that not only are the quantity of plants different inside the protected area, but the types of plants and trees are different, too. Because deer prefer to nibble on younger trees, fewer saplings survive in the main body of the forest than in the fenced-off portion. Outside the fence, the distribution of tree size and age is biased toward older trees. The population of trees is weighted more toward resinous species like pines, which deer prefer not to eat. Inside the fence, more saplings survive into young maturity, so you see a more even distribution of tree ages and a wider diversity of species represented in the mature trees. The changes in the canopy affect the ground cover which, in turn, change how deer could (if allowed) reach the trees and browse them.</p>
<p>So, here&rsquo;s a feedback loop that involves deer, trees, leaves and brush. The net result is a different ecosystem (albeit a slightly artificial one.)</p>
<p>Most physical and biological systems are like this in several ways, particularly relating to feedback. In our artificial systems (electrical, mechanical, symbolic, or semantic) we build in feedback mechanisms as a deliberate control. These are often one dimensional, proportional, and negative.</p>
<p>In natural systems, feedback arises everywhere. Sometimes, it proves to be helpful for the long-term stability of the system. In which case, the feedback itself gets reinforced by the existence and perpetuation of the system it exists within. In a sense, the system adapts to reinforce beneficial feedback. Conversely, feedback webs that cause too much instability will, like an overly aggressive virus, lead to destruction of their host system and disappear. So, we can see the constituents of a system co-evolving with each other and the system itself.</p>
<p>The old &ldquo;microphone-amplifier-speaker-squealing&rdquo; example of feedback really fails here. We lack both language and metaphor to really grasp this kind of interaction over time. In part, I think that&rsquo;s because we like to separate the world into isolated components and only talk about components at a single level of abstraction.  The trouble is that abstractions like &ldquo;level of abstraction&rdquo; only exist in our minds.</p>
<p>Here&rsquo;s another example of coevolution, courtesy of Jared Diamond in &ldquo;Guns, Germs, and Steel&rdquo;. I&rsquo;ll apologize in advance for oversimplifying; I&rsquo;m devoting a paragraph to an argument he develops across entire chapters.</p>
<p>At some point, a group of nomads decided that the seeds of these particular grasses were tasty. In collecting the grasses, they spread it around. Some kinds of seeds survived the winter better and responded well to being sown by humans. Now, nobody sat down and systematically picked out which seeds grew better or worse. They didn&rsquo;t have to, because the seeds that grew better produced more seeds for the next generation. Over time, a tiny difference (fractions of a percent) in productivity would lead some strains to supplant the others. Meanwhile, inextricably linked, some humans figured out how to plants, harvest, and eat these early grains. These humans had an advantage over their neighbors, so they were able to feed more babies.  That turns out to be a benefit, because farming is hard work and requires more offspring to help produce food.  (Another feedback loop.) Oh, and this kind of labor makes it advantageous to keep livestock, too. Over time, these farmers would breed and feed more children than the nomads, so farmers would come to be a larger and larger percentage of the population.  Just as an added wrinkle, keeping livestock and fertilizing fields both lead to diseases that simultaneously harm the individuals and occasionally decimate the population, but also provide some long-term benefits such as better disease resistance and inadvertent biological warfare when encountering other civilizations.</p>
<p>Try to diagram the feedback loops here: nomads, farmers, livestock, grains, birthrates, and so on. Everything is connected to everything else. It&rsquo;s really hard to avoid slipping into teleological language here. We&rsquo;ve got feedback and feedforward at several different levels and timescales here, from the scale of microbes to livestock to civilizations, and across centuries. This dynamic altered the course of many species evolution: cattle, wheat, maize, and yes, good old H. Sapiens.</p>
<p>This complexity of interaction extends to planetary and <a href="http://www.scientificamerican.com/article.cfm?id=exoplanets-lithium">stellar</a> levels as well. At some sufficiently long time scale, the intergalactic medium is coupled to our planetary ecosystem.</p>
<p>The human intellectual penchant for decomposition, isolation, and leveled abstraction is purely an artifact of the size of our bodies and the duration of our lives.</p>
]]></content></entry><entry><title>GMail Outage Was a Chain Reaction</title><link href="https://michaelnygard.com/blog/2009/09/gmail-outage-was-a-chain-reaction/"/><id>https://michaelnygard.com/blog/2009/09/gmail-outage-was-a-chain-reaction/</id><published>2009-09-02T09:25:03-05:00</published><updated>2009-09-02T09:25:03-05:00</updated><content type="html"><![CDATA[<p>Google has published an <a href="http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html">explanation of the widespread GMail outage</a> from September 1st. In this explanation, they trace the root cause to a layer of &ldquo;request routers&rdquo;:</p>
<blockquote>
<p>&hellip;a few of the request routers became overloaded and in effect told the rest of the system &ldquo;stop sending us traffic, we&rsquo;re too slow!&rdquo;. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.</p>
</blockquote>
<p>This perfectly describes the &ldquo;Chain Reaction&rdquo; stability antipattern from <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">Release It</a>!</p>
]]></content></entry><entry><title>Hadoop versus VPN</title><link href="https://michaelnygard.com/blog/2009/07/hadoop-versus-vpn/"/><id>https://michaelnygard.com/blog/2009/07/hadoop-versus-vpn/</id><published>2009-07-31T10:03:13-05:00</published><updated>2009-07-31T10:03:13-05:00</updated><content type="html"><![CDATA[<p>I&rsquo;ve been doing some work with <a href="http://hadoop.apache.org">Hadoop</a> lately, and I just ran into an interesting problem with networking.  This isn&rsquo;t a bug, per se, but a conflict in my configuration.</p>
<p>I&rsquo;m running on a laptop, using a pseudo-distributed cluster. That means all the different processes are running, but they&rsquo;re all running on one box. That makes it possible to test jobs with full network communication, but without deploying to a production cluster.</p>
<p>I&rsquo;m also working remotely, connecting to the corporate network by VPN. As is commonly done, our VPN is configured to completely separate the client machine from its local network. (If it didn&rsquo;t, you could use the VPN machine to bridge the secure corporate network to your home ISP, coffeeshop, airport, etc.)</p>
<p>Here&rsquo;s the problem: when on the VPN, my machine can&rsquo;t talk to its own IP address. Right now, <code>ifconfig</code> reports the laptops IP address as 192.168.1.105.  That&rsquo;s the address associated with the physical NIC on the machine.</p>
<p>The odd part is that Hadoop <em>mostly</em> works this way. I&rsquo;ve configured the name node, job tracker, task tracker, datanodes, etc. to all use &ldquo;localhost&rdquo;. I can use HDFS, I can submit jobs, and all the map tasks work fine. The only problem is that when the map tasks finish, the task tracker cannot send data from the map tasks to the reduce tasks.  The job appears to hang.</p>
<p>In the task tracker&rsquo;s log file, I see reports every 20 seconds or so that say</p>
<pre><code>2009-07-31 11:01:33,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200907310946_003_r_000000_0 0.0% reduce &gt; copy &gt;
</code></pre>
<p>The instant I disconnected from the VPN, the copy proceeded and the reduce job ran.</p>
<p>I&rsquo;m sure there&rsquo;s a configuration property somewhere within Hadoop that I can change. When (if) I find it, I&rsquo;ll update this post.</p>
]]></content></entry><entry><title>An AspectJ Circuit Breaker</title><link href="https://michaelnygard.com/blog/2009/07/an-aspectj-circuit-breaker/"/><id>https://michaelnygard.com/blog/2009/07/an-aspectj-circuit-breaker/</id><published>2009-07-16T09:01:15-05:00</published><updated>2009-07-16T09:01:15-05:00</updated><content type="html"><![CDATA[<p>Spiros Tzavellas pointed me to his implementation of Circuit Breaker. His approach uses AspectJ and can be applied using a bytecode weaver or AspectJ compiler.  He's also got unit tests with 85% coverage.</p>

<p>Spiros' project page is <a href="http://www.tzavellas.com/projects/circuit-breaker/">here</a>, and the code is (where else?) on <a href="http://github.com/sptz45/circuit-breaker/tree/master">GitHub</a>.  He appears to be quite actively developing the project.</p>

 
]]></content></entry><entry><title>Two New Circuit Breaker Implementations</title><link href="https://michaelnygard.com/blog/2009/07/two-new-circuit-breaker-implementations/"/><id>https://michaelnygard.com/blog/2009/07/two-new-circuit-breaker-implementations/</id><published>2009-07-16T07:35:45-05:00</published><updated>2009-07-16T07:35:45-05:00</updated><content type="html"><![CDATA[<p>The excellent Will Sargent has created a Circuit Breaker gem that's quite nice. You can read the docs at rdoc.info. He's released the code (under LGPL) on <a href="http://github.com/wsargent/circuit_breaker/tree/master">GitHub</a>.
</p>

<p>
The other one has actually been out for a couple of months now, but I forgot to blog about it.  Scott Vlamnick created a Grails plugin that uses AOP to weave Circuit Breaker functionality as "around" advice. This one can also report its state via JMX.  In a particularly nice feature, this plugin supports different configurations in different environments.
</p> 
]]></content></entry><entry><title>Workmen, tools, etc.</title><link href="https://michaelnygard.com/blog/2009/05/workmen-tools-etc./"/><id>https://michaelnygard.com/blog/2009/05/workmen-tools-etc./</id><published>2009-05-20T20:17:03-05:00</published><updated>2009-05-20T20:17:03-05:00</updated><content type="html"><![CDATA[<p>We&rsquo;ve all heard the old saw, &ldquo;It&rsquo;s a poor workman that blames his tools.&rdquo; Let&rsquo;s think about that for a minute. Does it actual mean that a skilled craftsman can do great work with shoddy implements?</p>
<p>Well, can a chef make a souffle with a skillet?</p>
<p>Can a cabinetmaker round an edge with dull router bits?</p>
<p>I&rsquo;m not going to rule it out.  Perhaps there&rsquo;s a brilliant chef who&mdash;at this very moment&mdash;is preparing to introduce the world to the &ldquo;skiffle.&rdquo;  And, it&rsquo;s possible that one could coax a dull router into making a better quarter round through care, attention, and good speed control.</p>
<p>Going by the odds, though, I&rsquo;d bet on scrambled eggs and splinters.</p>
<p>Like a lot of old sayings, this one doesn&rsquo;t make much sense in it&rsquo;s usual interpretation. Most people take this proverb to mean that you should be able to turn out top-notch work with whatever tools you&rsquo;re given. It&rsquo;s an excuse for bad tools, or lack of interest in improving them.</p>
<p>This homily dates back to a time when workers would bring their own tools to the job, leading to the popular origin story for the phrase &ldquo;getting sacked&rdquo;. (No comments about mÃ¸Ã¸se bites, please.) Some crafts have evaded the assembly line, and in those, craftsman still bring their own tools. Chefs bring their prized knives. Fine carpenters bring their own hand and bench tools.</p>
<p>There is a grain of truth in the common interpretation that good tools don&rsquo;t make a good workman. There&rsquo;s another level of truth under the surface, though.  The 13th Century French version of this saying translates as, &ldquo;A bad workman will never find a good tool.&rdquo;  I like this version a lot better. Tools cannot make one good, but bad tools can hurt a good worker&rsquo;s performance. That sounds a lot less like &ldquo;quit whining and use whatever&rsquo;s at hand,&rdquo; doesn&rsquo;t it?</p>
<p>On the other hand, if you supply your own tools, you&rsquo;re not as likely to tolerate bad ones, are you?  I think this is the most important interpretation. Good workers&mdash;if given the choice&mdash;will select the best tools and keep them sharp.</p>
]]></content></entry><entry><title>Minireview: Beginning Scala</title><link href="https://michaelnygard.com/blog/2009/05/minireview-beginning-scala/"/><id>https://michaelnygard.com/blog/2009/05/minireview-beginning-scala/</id><published>2009-05-18T14:41:57-05:00</published><updated>2009-05-18T14:41:57-05:00</updated><content type="html"><![CDATA[<p>As you can probably tell from my recent posts, I&rsquo;ve been learning Scala.  I recently dug into another Scala book, <a href="http://www.amazon.com/gp/product/1430219890?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1430219890">Beginning Scala</a> by David Pollak.</p>
<p>Beginning Scala is a nice, gentle introduction to this language. It takes a gradual, example driven approach that emphasizes running code early.  This makes it a good intro for people who want to <em>use</em> the language for applications first, then worry about creating frameworks later.</p>
<p>Don&rsquo;t let that fool you, though. Pollak gets to the sophisticated parts soon enough.  I particularly like a example of creating a new &ldquo;control structure&rdquo; to execute stuff in the context of a JDBC connection. This puts some meat on the argument that Scala is a &ldquo;scalable language.&rdquo;  Where other languages either implement this as a keyword (as in Groovy&rsquo;s &ldquo;with&rdquo;) or a framework (Spring&rsquo;s &ldquo;templates&rdquo;), here it can be added with one page of example code.</p>
<p>Beginning Scala also has a very thorough discussion of actors. I appreciate this, because actors were my main motivation for learning Scala in the first place.</p>
<p>Pollak separates the act of consuming a library from that of creating a library. He advises us to worry most about types, traits, co- and contravariance, etc. mainly when we are creating libraries. True to this notion, chapter 7 is called &ldquo;Traits and Types and Gnarly Stuff for Architects&rdquo;. It doesn&rsquo;t sound like much fun, but it is important material.  I find that Scala makes me think more about the type system than other languages.  It&rsquo;s strongly, and statically, typed.  (So much so, in fact, that it makes me realize just how loose Java&rsquo;s own type system is.) As such, it pays to have a firm understanding of how code turns into types.  Scala has a rich set of tools for building an expressive type system, but there is also complexity there.  Checking in at 60 pages, this chapter covers Scala&rsquo;s tools along with guidance on good styles and idioms.</p>
<p>Interestingly, although there is a <a href="http://liftweb.net/">Lift</a> logo on the cover, there&rsquo;s nothing about Lift in the book itself. Considering that Pollak is the creator of Lift, it&rsquo;s curious that this book doesn&rsquo;t deal with it.  Perhaps that&rsquo;s being left for another title.</p>
<p>Overall, I endorse Beginning Scala.</p>
]]></content></entry><entry><title>Units of Measure in Scala</title><link href="https://michaelnygard.com/blog/2009/05/units-of-measure-in-scala/"/><id>https://michaelnygard.com/blog/2009/05/units-of-measure-in-scala/</id><published>2009-05-07T22:00:09-05:00</published><updated>2009-05-07T22:00:09-05:00</updated><content type="html"><![CDATA[<p>Failure to understand or represent units has caused several <a href="http://www.devtopics.com/20-famous-software-disasters-part-2/">major disasters</a>, including the costly Ariane 5 disaster in 1996. This is one of those things that DSLs often get right, but mainstream programming languages just ignore.  Or, worse, they implement a clunky unit of measure library that ensures you can never again write a sensible arithmetic expression.</p>
<p>While I was at <a href="http://jaoo.com.au">JAOO Australia</a> this week, <a href="http://www.pandamonial.com/">Amanda Laucher</a> showed some F# code for a recipe that caught my attention. It used numeric literals with that directly attached units to quantities.  What&rsquo;s more, it was intelligent about combining units.</p>
<p>I went looking for something similar in Scala.  I googled my fingertips off, but without much luck, until <a href="http://twitter.com/milessabin">Miles Sabin</a> pointed out that there&rsquo;s already a compiler plugin sitting right next to the core Scala code itself.</p>
<h2 id="installing-units">Installing Units</h2>
<p>Scala has it&rsquo;s own package manager, called sbaz.  It can directly install the units extension:</p>
<p>sbaz install units</p>
<p>This will install it under your default managed installation. If you haven&rsquo;t done anything else, that will be your Scala install directory.  If you have done something else, you probably already know what you&rsquo;re doing, so I won&rsquo;t try to give you instructions.</p>
<h2 id="using-units">Using Units</h2>
<p>To use units, you first have to import the library&rsquo;s &ldquo;Preamble&rdquo;.  It&rsquo;s also helpful to go ahead and import the &ldquo;StandardUnits&rdquo; object.  That brings in a whole set of useful SI units.</p>
<p>I&rsquo;m going to do all this from the Scala interactive interpreter.</p>
<pre><code>scala&gt; import units.Preamble._
import units.Preamble._

scala&gt; import units.StandardUnits._
import units.StandardUnits._
</code></pre>
<p>After that, you can multiply any number by a unit to create a dimensional quantity:</p>
<pre><code>scala&gt; 20*m
res0: units.Measure = 20.0*m

scala&gt; res0*res0
res1: units.Measure = 400.0*m*m

scala&gt; Math.Pi*res0*res0
res2: units.Measure = 1256.6370614359173*m*m
</code></pre>
<p>Notice that when I multiplied a length (in meters) times itself, I got an area (square meters).  To me, this is a really exciting thing about the units library.  It can combine dimensions sensibly when you do math on them.  In fact, it can help prevent you from incorrectly combining units.</p>
<pre><code>scala&gt; val length = 5*mm
length: units.Measure = 5.0*mm

scala&gt; val weight = 12*g
weight: units.Measure = 12.0*g

scala&gt; length + weight
units.IncompatibleUnits: Incompatible units: g and mm
</code></pre>
<p>I can&rsquo;t add grams and millimeters, but I can multiply them.</p>
<h2 id="creating-units">Creating Units</h2>
<p>The StandardUnits package includes a lot of common units relating to basic physics.  It doesn&rsquo;t have any relating to system capacity metrics, so I&rsquo;d like to create some units for that.</p>
<pre><code>scala&gt; import units._
import units._

scala&gt; val requests = SimpleDimension(&quot;requests&quot;)
requests: units.SimpleDimension = requests

scala&gt; val req = SimpleUnit(&quot;req&quot;, requests, 1.0)
req: units.SimpleUnit = req

scala&gt; val Kreq = SimpleUnit(&quot;Kreq&quot;, requests, 1000.0)
Kreq: units.SimpleUnit = Kreq
</code></pre>
<p>Now I can combine that simple dimension with others.  If I want to express requests per second, I can just write it directly.</p>
<pre><code>scala&gt; 565*req/s
res4: units.Measure = 565.0*req/s
</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>This extension will be the first thing I add to new projects from now on.  The convenience of literals, with the extensibility of adding my own dimensions and units means I can easily keep units with all of my numbers.</p>
<p>There&rsquo;s no longer any excuse to neglect your units in a mainstream programming language.</p>
]]></content></entry><entry><title>Kudos to Relevance and Clojure</title><link href="https://michaelnygard.com/blog/2009/05/kudos-to-relevance-and-clojure/"/><id>https://michaelnygard.com/blog/2009/05/kudos-to-relevance-and-clojure/</id><published>2009-05-06T01:18:26-05:00</published><updated>2009-05-06T01:18:26-05:00</updated><content type="html"><![CDATA[<p>It&rsquo;s been a while since I blogged anything, mainly because most of my work lately has either been mind-numbing corporate stuff, or so highly contextualized that it wouldn&rsquo;t be productive to write about.</p>
<p>Something came up last week, though, that just blew me away.</p>
<p>For various reasons, I&rsquo;ve engaged <a href="http://thinkrelevance.com/">Relevance</a> to do a project for me.  (Actually, the first results were so good that I&rsquo;ve now got at least three more projects lined up.)  They decided&mdash;and by &ldquo;they&rdquo;, I mean <a href="http://blog.thinkrelevance.com/2009/4/4/programming-clojure-beta-9-is-out">Stuart Halloway</a>&mdash;to write the engine at the heart of this application in Clojure. That makes it sound like I was reluctant to go along, but actually, I was interested to see if the result would be as expressive and compact as everyone says.</p>
<p>Let me make a brief aside here and comment that I&rsquo;m finding it much harder to be the customer on an agile project than to be a developer.  I think there are two main reasons. First, it&rsquo;s hard for me to keep these guys supplied with enough cards to fill an iteration. They&rsquo;re outrunning me all the time.  Big organizations like my employer just take a long time to decide anything.  Second, there&rsquo;s nobody else I can defer to when the team needs a decision.  It often takes two weeks just for me to get a meeting scheduled with all of the stakeholders inside my company.  That&rsquo;s an entire iteration gone, just waiting to <em>get</em> to the meeting to make a decision!  So, I&rsquo;m often in the position of making decisions that I&rsquo;m not 100% sure will be agreeable to all parties.  So far, they have mostly worked out, but it&rsquo;s a definite source of anxiety.</p>
<p>Anyway, back to the main point I wanted to make.</p>
<p>My personal theme is making software production-ready. That means handling all the messy things that happen in the real world.  In a lab, for example, only one batch file ever needs to be processed at once.  You never have multiple files waiting for processing, and files are always fully present before you start working on them. In production, that only happens if you guarantee it.</p>
<p>Another example, from my system.  We have a set of rules (which are themselves written in Clojure code) that can be changed by privileged users.  After changing the configuration, you can tell the daemonized Clojure engine to &ldquo;(reload-rules!)&rdquo;.  The &ldquo;!&rdquo; at the end of that function means it&rsquo;s an imperative with major side effects, so the rules get reloaded right now.</p>
<p>I thought I was going to catch them up when I asked, oh so innocently, &ldquo;So what happens when you say (reload-rules!) while there&rsquo;s a file being processed on the other thread?&rdquo;  I just love catching people when they haven&rsquo;t dealt with all that nasty production stuff.</p>
<p>After a brief sidebar, Stu and <a href="http://vanderburg.org/">Glenn Vanderburg</a> decided that, in fact, nothing bad would happen at all, despite reloading rules in one thread while another thread was in the middle of using the rules.</p>
<p>Clojure uses a flavor of transactional memory, along with persistent data structures. No, that doesn&rsquo;t mean they go in a database. It means that changes to a data structure can only be made inside of a transaction.  The new version of the data structure and the old version exist simultaneously, for as long as there are outstanding references to them. So, in my case, that meant that the daemon thread would &ldquo;see&rdquo; the old version of the rules, because it had dereferenced the collection prior to the &ldquo;reload-rules!&rdquo;  Meanwhile, the reload-rules! function would modify the collection in its own transaction.  The next time the daemon thread comes back around and uses the reference to the rules, it&rsquo;ll just see the new version of the rules.</p>
<p>In other words, two threads can both use the same reference, with complete consistency, because they each see a point-in-time snapshot of the collection&rsquo;s state.  The team didn&rsquo;t have to do anything special to make this happen&hellip; it&rsquo;s just the way that Clojure&rsquo;s references, persistent data structures, and transactional memory work.</p>
<p>Even though I didn&rsquo;t get to catch Stu and Glenn out on a production readiness issue, I still had to admit that was pretty frickin&rsquo; cool.</p>
]]></content></entry><entry><title>JAOO Australia in 1 Month</title><link href="https://michaelnygard.com/blog/2009/04/jaoo-australia-in-1-month/"/><id>https://michaelnygard.com/blog/2009/04/jaoo-australia-in-1-month/</id><published>2009-04-03T12:12:06-05:00</published><updated>2009-04-03T12:12:06-05:00</updated><content type="html"><![CDATA[<p>The <a href="http://jaoo.com.au/">Australian JAOO</a> conferences are now just one month away.  I&rsquo;ve wanted to get to Australia for at least ten years now, so I am thrilled to finally get there.</p>
<p>I&rsquo;ll be delivering a tutorial on <a href="http://jaoo.com.au/sydney-2009/presentation/Release+IT">production ready software</a> in both the <a href="http://jaoo.com.au/brisbane-2009/">Brisbane</a> and <a href="http://jaoo.com.au/sydney-2009/">Sydney</a> conferences. This tutorial was a hit at <a href="http://qconlondon.com">QCon London</a>, where I first delivered it. The Australian version will be further improved.</p>
<p>During the main conference, I&rsquo;ll be delivering a two-part talk on <a href="http://jaoo.com.au/sydney-2009/presentation/Failure+Comes+in+Flavours+(Part+1)">common failure modes</a> of distributed systems break and <a href="http://jaoo.com.au/sydney-2009/presentation/Failure+Comes+in+Flavours+(Part+2)">how to recover</a> from such breakage. These talks apply whether you&rsquo;re building web facing systems or internal shared services/SOA projects.</p>
]]></content></entry><entry><title>Quantum Backups</title><link href="https://michaelnygard.com/blog/2009/03/quantum-backups/"/><id>https://michaelnygard.com/blog/2009/03/quantum-backups/</id><published>2009-03-20T08:40:54-05:00</published><updated>2009-03-20T08:40:54-05:00</updated><content type="html"><![CDATA[<p>Backups are the only macroscopic system we commonly deal with that exhibits quantum mechanical effects. This is odd enough that I&rsquo;ve spent some time getting tangled up in these observations.</p>
<p>Until you attempt a restore, a backup set is neither good nor bad, but a superposition of both. This is the <em>superposition principle</em>.</p>
<p>The peculiarity of the superposition principle is dramatically illustrated with the experiment of <em>SchrÃ¶dinger&rsquo;s backup</em>. This is when you attempt to restore SchrÃ¶dinger&rsquo;s pictures of his cat, and discover that the cat is not there.</p>
<p>In a startling corollary, if you use offsite vaulting, a second quantum variable is introduced, in that the backup set exists and does not exist simultaneously. A curious effect emerges upon applying the Hamiltonian operator. The operator shows that certain eigenvalues are always zero, revealing that prime numbered tapes greater than 5 in a set never exist.</p>
<p>Finally, the <em>Heisenbackup principle</em> says that the user of a system is entangled with the system itself. As a result, within 30 days of consciously deciding that you do not need to run a backup, you will experience a complete disk crash.  Because you&rsquo;ve just read this, your 30 days start now.</p>
<p>Sorry about that.</p>
]]></content></entry><entry><title>Update: Sun Cloud API Not the Same as Amazon</title><link href="https://michaelnygard.com/blog/2009/03/update-sun-cloud-api-not-the-same-as-amazon/"/><id>https://michaelnygard.com/blog/2009/03/update-sun-cloud-api-not-the-same-as-amazon/</id><published>2009-03-19T07:29:33-05:00</published><updated>2009-03-19T07:29:33-05:00</updated><content type="html"><![CDATA[<p>It looks like the early reports that Sun&rsquo;s cloud API would be compatible with AWS resulted from the reporters&rsquo; exuberance (or mere confusion.)</p>
<p>It&rsquo;s actually nicer than Amazon&rsquo;s.</p>
<p>It is based on the REST architectural style, with representations in JSON.  In fact, I might start using it as the best embodiment of REST principles.  You start with an HTTP GET of &ldquo;/&rdquo;. In this repsonse to this and every other request, it is the hyperlinks in the response that indicate what actions are allowed.</p>
<p>Sun has a wiki to describe the API, with a very nicely illustrated &ldquo;Hello, Cloud&rdquo; example.</p>
]]></content></entry><entry><title>Can you make that meeting?</title><link href="https://michaelnygard.com/blog/2009/03/can-you-make-that-meeting/"/><id>https://michaelnygard.com/blog/2009/03/can-you-make-that-meeting/</id><published>2009-03-18T15:55:13-05:00</published><updated>2009-03-18T15:55:13-05:00</updated><content type="html"><![CDATA[<p>I&rsquo;m convinced that the next great productivity revolution will be de-matrixing the organizations we&rsquo;ve just spent ten years slicing and dicing.</p>
<p>Yesterday, I ran into a case in point: What are the odds that three people can schedule a meeting this week versus having to push it into next week?</p>
<p>Turns out that if they&rsquo;re each 75% utilized, then there&rsquo;s only a 15% chance they can schedule a one hour meeting this week.  (If you always schedule 30 minute meetings instead of one hour, then the odds go up to about 25%.)</p>
<p>Here&rsquo;s the probability curve that the meeting can happen. This assumes, by the way, that there are no lunches or vacation days, and that all parties are in the same time zone. It only gets worse from here.</p>
<img src="/images/blog/meeting/meeting_odds_3_parties.png" style="float: none; border: none;" />
<p>So, overall, there&rsquo;s about an 85% chance that 3 random people in a meeting-driven company will have to defer until next week.</p>
<p>Bring it up to 10 people, in a consensus-driven, meeting-oriented company, and the odds drop to 0.00095%.</p>
<p>No wonder &ldquo;time to first meeting&rdquo; seems to dominate &ldquo;time to do stuff.&rdquo;</p>
]]></content></entry><entry><title>Amazon as the new Intel</title><link href="https://michaelnygard.com/blog/2009/03/amazon-as-the-new-intel/"/><id>https://michaelnygard.com/blog/2009/03/amazon-as-the-new-intel/</id><published>2009-03-18T11:31:07-05:00</published><updated>2009-03-18T11:31:07-05:00</updated><content type="html"><![CDATA[<p><strong>Update: Please read <a href="/blog/2009/03/update-sun-cloud-api-not-the-s/">this update</a>. The information underlying this post was based on early, somewhat garbled, reports.</strong></p>
<p>A brief digression from the <a href="/blog/2009/03/getting-real-about-reliability/">unpleasantness of reliability</a>.</p>
<p>This morning, Sun announced their re-entry into the cloud computing market. After withdrawing Network.com from the marketplace a few months ago, we were all wondering what Sun&rsquo;s approach would be. No hardware vendor can afford to ignore the cloud computing trend&hellip; it&rsquo;s going to change how customers view their own data centers and hardware purchases.</p>
<p>One thing that really caught my interest was the description of Sun&rsquo;s cloud offering. It sounded really, really similar to <a href="http://aws.amazon.com">AWS</a>. Then I heard the E-word and it made perfect sense. Sun <a href="http://www.sun.com/aboutsun/pr/2009-03/sunflash.20090318.2.xml">announced</a> that they will use <a href="http://eucalyptus.cs.ucsb.edu/">EUCALYPTUS</a> as the control interface to their solution. EUCALYPTUS is an open-source implementation of the AWS APIs.</p>
<p>Last week at <a href="http://qconlondon.com">QCon London</a>, we heard <a href="http://www.linkedin.com/in/simonwardley">Simon Wardley</a> give a brilliant talk, in which he described <a href="http://www.canonical.com/">Canonical</a>&rsquo;s plan to create a de facto open standard for cloud computing by seeding the market with open source implementations. Canonical&rsquo;s plan? <a href="http://www.ubuntu.com/">Ubuntu</a> and private clouds running EUCALYPTUS.</p>
<p>It looks like Amazon may be setting the standard for cloud computing, in the same way that Intel set the standard for desktop and server computing, by defining the programming interface.</p>
<p>I don&rsquo;t worry about this, for two reasons. One, it forestalls any premature efforts to force a de jure standard. This space is still young enough that an early standard can&rsquo;t help but be a drag on exploration of different business and technical models. Two, Amazon has done an excellent job as a technical leader. If their APIs &ldquo;win&rdquo; and become de facto standards, well, we could do a lot worse.</p>
]]></content></entry><entry><title>Getting Real About Reliability</title><link href="https://michaelnygard.com/blog/2009/03/getting-real-about-reliability/"/><id>https://michaelnygard.com/blog/2009/03/getting-real-about-reliability/</id><published>2009-03-16T17:10:02-05:00</published><updated>2009-03-16T17:10:02-05:00</updated><content type="html"><![CDATA[<p><img width="103" height="240" border="0" align="left" src="/images/blog/reliability/rel_1-2.png" />In my <a href="/blog/2009/02/reliability-math/">last post</a>, I user some back-of-the-envelope reliability calculations, with just one interesting bit, to estimate the availability of a single-stacked web application, shown again here. I cautioned that there were a lot of unfounded assumptions baked in. Now it's time to start removing those assumptions, though I reserve the right to introduce a few new ones.</p>

<h2>Is it there when I want it?</h2>

<p>First, lets talk about the hardware itself.&nbsp; It's very likely that these machines are what some vendors are calling &quot;industry-standard servers.&quot; That's a polite euphemism for &quot;x86&quot; or &quot;ia64&quot; that just doesn't happen to mention Intel. ISS servers are expected to exhibit 99.9% availability.</p>

<p>There's something a little bit fishy about that number, though. It's one thing to say that a box is up and running (&quot;available&quot;) 99.9% of the times you look at it.If I check it every hour for a year, and find it alive at least 8,756 out of 8,765 times, then it's 99.9% available. It might have broken just once for 9 hours, or it might have broken 9 times for an hour each, or it might have broken 36 times for half an hour each.</p>

<p>This is the difference between availability and reliability. Availability measures the likelihood that a system can perform its function at a specific point in time. Reliability, on the other hand, measures the likelihood that a system will have failed before a point in time. Availability and reliability both matter to your users. In fact, a large number of small outages can be just as frustrating as a single large event. (I do wonder... since both ends of the spectrum seem to stick out in users' memories, perhaps there's an optimum value for the duration and frequency of outages, where they are seldom enough to seem infrequent, but short enough to seem forgivable?)</p>

<p>We need a bit more math at this point.</p>

<h2>It must be science... it's got integrals.</h2>

<p>Let's suppose that hardware failures can be described as function of time, and that they are essentially random. It's not like the story of the <a href="http://thedailywtf.com/Articles/A__0x26_quot_0x3b_Priceless_0x26_quot_0x3b__Server_Room_0x3a__Priceless.aspx">&quot;priceless&quot; server room</a>, where failure can be expected based on actions or inaction. We'll also carry over the previous assumption that hardware failures among these three boxes are independent. That is, failure of any one boxes does not make other boxes more likely to fail.</p>

<p>We want to determine the likelihood that the box is available, but the random event we're concerned with is a fault. Thus, we first need to find the probability that a fault has occurred by time t. Checking for a failure is sampling for an event X between times 0 and t.</p>

<p><img style="border: none; float: none" src="/images/blog/reliability/rel_2-1.png" /></p>

<p>The function f(t) is the probability distribution function that describes failures of this system. We'll come back to that shortly, because a great deal hinges on what function we use here. The reliability of the system, then is the probability that the event X <em>didn't</em> happen by time t. </p>

<p><img width="286" height="31" style="border: none; float: none;" src="/images/blog/reliability/rel_2-2.png" /></p>

<p>One other equation that will help in a bit is the failure rate, the number of failures to expect per unit time. Like reliability, the failure rate can vary over time. The failure rate is:</p>

<p><img style="border:none; float:none;" src="/images/blog/reliability/rel_2-3.png" /></p>

<h2>Failure distributions</h2>

<p>So now we've got integrals to infinity of unknown functions. This is progress?</p>

<p>It is progress, but there are some missing pieces. Next time, I'll talk about different probability distributions, which ones make sense for different purposes, and how to calibrate them with observations.</p> 
]]></content></entry><entry><title>Reliability Math</title><link href="https://michaelnygard.com/blog/2009/02/reliability-math/"/><id>https://michaelnygard.com/blog/2009/02/reliability-math/</id><published>2009-02-27T23:20:16-06:00</published><updated>2009-02-27T23:20:16-06:00</updated><content type="html"><![CDATA[<p>Suppose you build a web site out of a single stack of one web, app, and database server. What sort of availability SLA should you be willing to support for this site?</p>

<p><img width="103" height="240" style="border: none; float: left;" src="/images/blog/reliability/rel_1-2.png" /></p>

<p>We'll approach this in a few steps. For the first cut, you'd say that the appropriate SLA is just the expected availability of the site. Availability is defined in different ways depending on when and how you expect to measure it, but for the time being, we'll say that availability is the probability of getting an HTTP response when you submit a request. This is the <em>instantaneous availability</em>. </p>

<p>What is the probability of getting a response from the web server? Assuming that every request goes through all three layers, then the probability of a response is the probability that all three components are working. That is:</p>

<p><img width="282" height="12" style="border: none; float: none;" src="/images/blog/reliability/rel_1-1.png" /></p>

<p>This follows our intuition pretty closely. Since any of the three servers can go down, and any one server down takes down the site, we'd expect to just multiply the probabilities together. But what should we use for the reliability of the individual boxes? We haven't done a test to failure or life cycle test on our vendor's hardware. In fact, if our vendor has any MTBF data, they're keeping it pretty quiet.</p>

<p>We can spend some time hunting down server reliability data later. For now, let's just try to estimate it. In fact, let's estimate widely enough that we can be 90% confident that the true value is within our range. This will give us some pretty wide ranges, but that's OK... we haven't dug up much data yet, so there should be a lot of uncertainty. Uncertainty isn't a show stopper, and it isn't an excuse for inaction. It just means there are things we don't yet know. If we can quantify our uncertainty, then we can still make meaningful decisions. (And some of those decisions may be to go study something to reduce the uncertainty!)</p>

<p>Even cheap hardware is getting pretty reliable. Would you expect every server to fail once a year? Probably not. It's less frequent than that. One out of the three servers fail every two years? Seems to be a little pessimistic, but not impossible. Let's start there. If every server fails once every two years, at a constant rate [<a href="#1">1</a>], then we can say that the lower bound on server reliability is 60.6%. Would we expect all of these servers to run for five years straight without a failure? Possible, but unlikely. Let's use one failure over five years as our upper bound. One failure out of fifteen server-years would give an annual availability of 93.5% for each server.</p>

<p>So, each server's availability is somewhere between 60.6% and 93.5%. That's a pretty wide range, and won't be satisfactory to many people. That's OK, because it reflects our current degree of uncertainty.</p>

<p>To find the overall reliability, I could just take the worst case and plug it in for all three probabilities, then plug in the best case. That slightly overstates the edge cases, though. I'm better off getting Excel to help me run a Monte Carlo analysis to give me an average across a bunch of scenarios. I'll construct a row that randomly samples a scenario from within these ranges. It will pick three values between 60.6% and 93.5% and compute their product. Then, I'll copy that row 10,000 times by dragging it down the sheet. Finally, I'll average out the computed products to get a range for the overall reliability. When I do that, I get a weighted range of 28.9% to 62.6%. [<a href="#2">2</a>] [<a href="#3">3</a>]</p>

<p>Yep, this single stack web site will be available somewhere between 28.9% of the time and 62.6%. [<a href="#4">4</a>]</p>

<p>Actually, it's likely to be <em>worse</em> than that. There are two big problems in the analysis so far. First, we've only accounted for hardware failures, but software failures are a much bigger contributor to downtime. Second, more seriously, the equation for overall reliability assumes that all failures are disjoint. That is, we implicitly assumed that nothing could cause more than one of these servers to fail simultaneously. Talk about Pollyanna! We've got common mode failures all over the place, especially in the network, power, and data center arenas.</p>

<p>Next time, we'll start working toward a more realistic calculation.</p>

<hr>

<p><a name="1">1</a>. I'm using a lot of simplifying assumptions right now. Over time, I'll strip these away and replace them with more realistic calculations. For example, a constant failure rate implies an exponential distribution function. It is mathematically convenient, but doesn't represent the effects of aging on moving components like hard drives and fans.</p>

<p><a name="2">2</a>. You can download the spreadsheet <a href="/downloads/Single_Stack_Monte_Carlo.xls">here</a>.</p>

<p><a name="3">3</a>. These estimation and analysis techniques are from &quot;<a href="http://www.howtomeasureanything.com">How to Measure Anything</a>&quot; by Doug Hubbard. </p>

<p><a name="4">4</a>. Clearly, for a single-threaded stack like this, you can achieve much higher reliability by running all three layers on a single physical host.</p> 
]]></content></entry><entry><title>2009 Calendar as OmniGraffle Stencil</title><link href="https://michaelnygard.com/blog/2009/02/2009-calendar-as-omnigraffle-stencil/"/><id>https://michaelnygard.com/blog/2009/02/2009-calendar-as-omnigraffle-stencil/</id><published>2009-02-27T23:10:32-06:00</published><updated>2009-02-27T23:10:32-06:00</updated><content type="html"><![CDATA[<p>I had need of a stencil that would let me drop monthly calendars on a number of pages. I found it useful, and someone else might, too.</p><p><a href="/downloads/2009.gstencil">Download the stencil.</a></p> 
]]></content></entry><entry><title>Fast Iteration versus Elegant Design</title><link href="https://michaelnygard.com/blog/2009/02/fast-iteration-versus-elegant-design/"/><id>https://michaelnygard.com/blog/2009/02/fast-iteration-versus-elegant-design/</id><published>2009-02-21T16:00:42-06:00</published><updated>2009-02-21T16:00:42-06:00</updated><content type="html"><![CDATA[<p>I love the way that <a target="_blank" href="http://www.reddit.com/r/programming/">proggit</a> bubbles stuff around. Today, for a while at least, the top link is to a story from <a target="_blank" href="http://www.salon.com">Salon</a> in May of 2000 about Bill and Lynne Jolitz, the creators of <a target="_blank" href="http://www.386bsd.org/">386BSD</a>.</p><p>[An aside: I'm not sure exactly when I became enough of a graybeard to remember as current events things which are now discussed as history. It's really disturbing that an article from almost a decade ago talks about events seven years earlier than that, and I remember them happening! To me, the <em>real</em> graybeards are the guys that created UNIX and C to begin with. Me? I'm part of the second or third UNIX generation, at best. Sigh...]</p><p>Anyway, Bill and Lynne Jolitz created the first free, open-source UNIX that ran on x86 chips.&nbsp; <a target="_blank" href="http://en.wikipedia.org/wiki/Coherent_(operating_system)">Coherent</a> was around before that, and I think SCO UNIX was available for x86 at the same time. SCO wasn't evil then, just expensive. In those days, you had to lay down some serious jing to get UNIX on your PC. <a target="_blank" href="http://www.minix3.org/">Minix</a> was available for free, but <a target="_blank" href="http://en.wikipedia.org/wiki/Andrew_S._Tanenbaum">Tannenbaum</a> held firm that Minix should teach principles rather than be a production OS, so he favor pedagogical value over functionality. Consequently, Minix wasn't a full UNIX implementation. (At least at that time. It might be now.)<br /></p><p>Just contemplate the hubris of two programmers deciding that they would create their own operating system, to be UNIX, but fixing the flaws, hacks, and workarounds that had built up over more than a decade. Not only that, but they would choose to give it away for the cost of floppies! And not only that, but they would build it for a processor that serious UNIX people <a target="_blank" href="http://dilbert.com/strips/comic/1995-06-24/">sneered at</a>. Most impressive of all, they succeeded. 386BSD was a technically superior, well-architected version of UNIX for commodity hardware. The Jolitzes extrapolated Intel's growth curve and rapid product cycles and saw that x86 processors would advance far faster than the technically superior RISC chips.</p><p>At various times, I ran Minix, 386BSD, and SCO UNIX on my PC well before I even heard of Linux. Each of them had the field before Linus even made his 0.1 release. <br /></p><p>So why is Linux everywhere, and we only hear about 386BSD in historical contexts? There is exactly one answer, and it's what <a target="_blank" href="http://en.wikipedia.org/wiki/Eric_Raymond">Eric Raymond</a> was really talking about in <a target="_blank" href="http://www.amazon.com/gp/product/0596001088?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0596001088">The Cathedral and the Bazaar</a>. TCatB has been seen mostly as an argument for open-source versus commercial software, but what Raymond saw was that the real competition comes down to an open contribution model versus closed contributions. Linus' promiscuous contribution policy simply let Linux out-evolve 386BSD. More contributors meant more drivers, more bug fixes, more enhancements... more ideas, ultimately. Two people, no matter how talented, cannot outcode thousands of Linux contributors. The best programmers are 10 times more productive than the average, and I would rate Bill and Lynne among the very best. But, as of last April, the Linux Foundation reported that more than 3,600 people had contributed to the kernel alone.<br /></p><p>Iteration is one of the fundamental dynamics. Iteration facilitates adaptation, and adaptation wins competition. History is littered with the carcasses of &quot;superior&quot; contenders that <a href="http://www.amazon.com/gp/product/0060521996?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0060521996">simply didn't adapt as fast</a> as their victorious challengers.<br /></p> 
]]></content></entry><entry><title>Why Do Enterprise Applications Suck?</title><link href="https://michaelnygard.com/blog/2009/02/why-do-enterprise-applications-suck/"/><id>https://michaelnygard.com/blog/2009/02/why-do-enterprise-applications-suck/</id><published>2009-02-20T22:23:55-06:00</published><updated>2009-02-20T22:23:55-06:00</updated><content type="html"><![CDATA[<p>What is it about enterprise applications that makes them suck?</p>
<p>I mean, have you ever seen someone write <a href="http://radar.oreilly.com/2008/11/why-i-like-twitter.html">1,500 words</a> about how much they love their corporate expense reporting system? Or spend their free time mashing up the job posting system together with Google maps? Of course not. But why not?</p>
<p>There&rsquo;s a quality about some software that inspires love in their users, and it&rsquo;s totally devoid in enterprise software. The best you can ever say about enterprise software is when it doesn&rsquo;t get in the way of the business. At it&rsquo;s worst, enterprise software creates more work than it automates.</p>
<p>For example, in my company, we&rsquo;ve got a personnel management tool that&rsquo;s so unpredictable that every admin in the company keeps his or her own spreadsheet of requests that have been entered. They have to do that because the system itself randomly eats input, drops requests, or silently trashes forms. It&rsquo;s not a training problem, it&rsquo;s just lousy software.</p>
<p>We&rsquo;ve got a time-tracking system that has a feature where an employee can enter in a vacation request. There&rsquo;s a little workflow triggered to have the supervisor approve the vacation request. I&rsquo;ve seen it used inside two groups. In both cases, the employee negotiates the leave request via email then enters it into the time tracking system. I know several people who use Travelocity to find their flights before they log in to our corporate travel system. And you wouldn&rsquo;t even believe how hard our sales force automation system is compared to Salesforce.com.</p>
<p>Way back in 1937, <a href="http://en.wikipedia.org/wiki/Ronald_Coase">Ronald Coase</a> elaborated his theory about why corporations exist. He said that a firm&rsquo;s boundaries should be drawn so as to minimize <a href="http://en.wikipedia.org/wiki/Transaction_cost">transaction costs</a>&hellip; search and information costs, bargaining costs, and cost of policing behavior. By almost every measure, then, external systems offer lower transaction costs than internal ones. No wonder some people think IT doesn&rsquo;t matter.</p>
<p>If the best you can do is not mess up a nondifferentiating function like personnel management, it&rsquo;s tough to claim that IT can be a competitive advantage. So, again I&rsquo;ll ask, why?</p>
<p>I think there are exactly four reasons that internal corporate systems are so unloved and unlovable.</p>
<ol>
<li>The serve their corporate overlords, not their users.</li>
</ol>
<p>This is simple. Corporate systems are built according to what analysts believe will make the company more efficient. Unfortunately, this too often falls prey to penny-wise-pound-foolish decisions that micro-optimize costs while suboptimizing the overall <a href="http://www.michaelnygard.com/blog/2008/02/outrunning-your-headlights/">value stream</a>. Optimizing one person&rsquo;s job with a system that creates more work for a number of other people doesn&rsquo;t do any good for the company as a whole.</p>
<ol start="2">
<li>They only do gray-suited, stolidly conservative things.</li>
</ol>
<p>Corporate IM now looks like an obvious idea, but messaging started frivolously. It was blocked, prohibited, and firewalled. In 1990, who would have spent precious capital on something to let cubicle-dwellers ask each other what they were doing for lunch? As it turns out, a few companies were on the leading edge of that wave, but their illicit communications were done in spite of IT. How many companies would build something to <a href="http://innovationgames.com/">Create Breakthrough Products Through Collaborative Play</a>?</p>
<ol start="3">
<li>They have captive audiences.</li>
</ol>
<p>If your company has six purchasing systems, that&rsquo;s a problem. If you have a choice of six online stores, that&rsquo;s competition.</p>
<ol start="4">
<li>They lack &ldquo;give-a-shitness&rdquo;.</li>
</ol>
<p>I think this one matters most of all. Commerce sites, Web 2.0 startups, IM networks&hellip; the software that people love was created by people who love it, too. It&rsquo;s either their ticket to F-U money, it&rsquo;s their brainchild, or it&rsquo;s their livelihood. The people who build those systems live with them for a long time, often years. They have reason to care about the design and about keeping the thing alive.</p>
<p>This is also why, once acquired, startups often lose their luster. The founders get their big check and cash out. The barnstormers that poured their passion into it discover they don&rsquo;t like being assimilated and drift away.</p>
<p>Architects, designers, and developers of corporate systems usually have little or no voice in what gets built, or how, or why. (Imagine the average IT department meeting where one developer says this system really ought to be built using Scala and Lift.) The don&rsquo;t sign on, they get assigned. I know that individual developers do care passionately about their work, but usually have no way to really make a difference.</p>
<p>The net result is that corporate software is software that nobody gives a shit about: not its creators, not its investors, and not its users.</p>
]]></content></entry><entry><title>Tracking and Trouble</title><link href="https://michaelnygard.com/blog/2009/02/tracking-and-trouble/"/><id>https://michaelnygard.com/blog/2009/02/tracking-and-trouble/</id><published>2009-02-19T11:55:27-06:00</published><updated>2009-02-19T11:55:27-06:00</updated><content type="html"><![CDATA[<p>Pick something in your world and start measuring it.&nbsp; Your measurements will surely change a little from day to day. Track those changes over a few months, and you might have a chart something like this.</p>

<p><img width="700" height="437" style="border:none; float:none;"  title="First 100 samples" alt="First 100 samples" src="/images/blog/tracking/tracking_variable_first_100.png" /></p>

<p>Now that you've got some data assembled, you can start analyzing it. The average over this sample is 59.5. It's got a variance of 17, which is about 28% of the mean. You can look for trends. For example, we seem to see an upswing for the first few months, then a pullback starting around 90 days into the cycle. In addition, it looks like there is a pretty regular oscillation superimposed on the main trend, so you might be looking at some kind of weekly pattern as well.</p>

<p>The next few months of data should make the patterns clearer.</p>

<p><img width="700" height="437" style="border:none; float:none;" src="/images/blog/tracking/tracking_variable_first_200.png" alt="First 200 samples." title="First 200 samples." /></p>

<p>Indeed, from this chart, it looks pretty clear that the pullback around 100 days was the early indicator of a flattening in the overall growth trend from the first few months. Now, the weekly oscillations are pretty much the only movement, with just minor wobbles around a ceiling.</p>

<p>I'll fast forward and show the full chart, spanning 1000 samples (over three years' worth of daily measurements.)</p>

<p><img width="700" height="435" style="border:none; float:none;" src="/images/blog/tracking/tracking_variable_full_chart.png" alt="Full chart of 100 samples" title="Full chart of 100 samples" /></p>

<p>Now we can see that the ceiling established at 65 held against upward pressure until about 250 days in, when it finally gave way and we reached a new support at about 80. That support lasted for another year, when we started to see some gradual downward pressure resulting in a pullback to the mid-70s.</p>

<p>You've probably realized by now that I'm playing a bit of a game with you. These charts aren't from any stock market or weather data. In fact, they're completely random. I started with a base value of 55 and added a little random value each &quot;day&quot;.</p>

<p>When you see the final chart, it's easy to see it as the result of a random number generator.&nbsp; If you were to live this chart, day by day, however, it's exceedingly hard <em>not</em> to impose some kind of meaning or interpretation on it. The tough part is that you actually can see some patterns in the data.&nbsp; I didn't force the weekly oscillations into the random number function, they just appeared in the graph. We are all exceptional good at <a target="_blank" href="http://www.michaelshermer.com/weird-things/">pattern detection and matching</a>. We're so good, in fact, that we find patterns all over the place. When we are confronted with obvious patterns, we tend to believe that they're real or that they emerge from some underlying, meaningful structure. But sometimes, they're really just nothing more than randomness.</p><p><a target="_blank" href="http://www.amazon.com/gp/product/0812975219?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0812975219">Nassim Nicholas Taleb</a> is today's guru of randomness, but <a target="_blank" href="http://www.amazon.com/gp/product/0465043577?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0465043577">Benoit Mandelbrot</a> wrote about it earlier in the decade, and <a target="_blank" href="http://www.amazon.com/gp/product/0060555661?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0060555661">Benjamin Graham</a> wrote about this problem back in the 1920's. I suspect someone has sounded this warning every decade since statistics were invented. Graham, Mandelbrot, and Taleb all tell us that, if we set out to find patterns in historical data, we will always find them. Whether those patterns have any intrinsic meaning is another question entirely. Unless we discover that there are real forces and dynamics that underlie the data, we risk fooling ourselves again and again.</p>

<p>We can't abandon the idea of prediction, though. Randomness is real, and we have a tendency to be fooled by it. Still, even in the face of those facts, we really do have to make predictions and forecasts. Fortunately, there are about a dozen really effective ways to deal with the fundamental uncertainty of the future. I'll spend a few posts exploring these different ways to deal with the uncertainty of the future.</p> 
]]></content></entry><entry><title>Booklist</title><link href="https://michaelnygard.com/blog/2009/02/booklist/"/><id>https://michaelnygard.com/blog/2009/02/booklist/</id><published>2009-02-14T18:10:34-06:00</published><updated>2009-02-14T18:10:34-06:00</updated><content type="html"><![CDATA[<p>I made a LibraryThing list of books relevant to the stuff that&rsquo;s banging around in my head now.  These are in no particular order or organization.  In fact, this is a live widget, so it might change as I think of other things that should be on the list.</p>
<p>The key themes here are <em>time</em>, <em>complexity</em>, <em>uncertainty</em>, and <em>constraints</em>.  If you&rsquo;ve got recommendations along these lines, please send them my way.</p>
<p><em>2024 update: the widget no longer works, the account is lost, this list is a victim of linkrot</em></p>
]]></content></entry><entry><title>Cold Turkey</title><link href="https://michaelnygard.com/blog/2009/02/cold-turkey/"/><id>https://michaelnygard.com/blog/2009/02/cold-turkey/</id><published>2009-02-13T22:10:23-06:00</published><updated>2009-02-13T22:10:23-06:00</updated><content type="html"><![CDATA[<p>Last night, I did something pretty drastic.&nbsp; It wasn't on impulse... I had been thinking about this for quite a while. Finally, I decided to take the band-aid approach and just do it all at once.</p><p>I deleted all my games.</p><p>New and old alike, they all went.&nbsp; Bioshock, System Shock, System Shock II.&nbsp; GTA IV. GTA: Vice City. (I skipped San Adreas.) Venerable Diablo I and II, not to mention their leering cousin Overlord. Age of Empires. Several versions of Peggle and Bejeweled. Warcraft III. Every incarnation of Half-Life and Half-Life 2. Uplink, Darwinia, Wingnuts, Weird Worlds and SPORE.</p><p>Well, OK, it&nbsp; wasn't <em>that</em> hard to give up SPORE, but seriously, deleting Darwinia hurt.</p><p>Why chuck hundreds of dollars of software into the bin? It's all about time. My own time and time with a capital 'T'. I need time to understand Time. Too much recombinant thought has taken up residence. It's time to marshal these unruly ideas and get them out. So, the games served during gestation, but now it's time and Time and past time for me to put them aside and get scholastic. Put pen to paper, or fingers to keyboard. Time to run some numbers, see the scenarios, and try to synthesize a cohesive whole. Time to abstract and distill and methodologize.<br /></p><p>I know I'm being obscure. How can I not? Take the number of people exposed to a given process theory (OODA). Multiply it by the fraction who also know the second through the seventh (ToC, Lean, Six Sigma, TQM, Agile, Strategic Navigation). Mix in dynamical systems thinking (Senge, Liker, Hock.) Intersect that group with people who know something about uncertainty, complexity, and time. Now intersect it with people who view all the world as material and economic flux.&nbsp; (If you are a member of the resulting set, I want to talk to you!)&nbsp; I know all these things are deeply connected, but if I could articulate how, and why, then I'd already be done.</p><p>One thing I am already sure about, though, is this: It is all about Time. Time is far more fundamental and far less understood than you'd think.&nbsp; I'm now just talking about inappropriately scaled-up quantum mechanics metaphors.&nbsp; I mean that people fundamentally trip up on Time all the time.&nbsp; &quot;The Black Swan&quot; is just the tip of the iceberg. <br /></p><p>If it works, I'll sound like an utter crackpot, raving and waving my very own personal ToE.</p><p>If it doesn't work, well, Steam knows which games I bought. I can always reinstall them.</p> 
]]></content></entry><entry><title>Subtle Interactions, Non-local Problems</title><link href="https://michaelnygard.com/blog/2009/02/subtle-interactions-non-local-problems/"/><id>https://michaelnygard.com/blog/2009/02/subtle-interactions-non-local-problems/</id><published>2009-02-12T09:54:57-06:00</published><updated>2009-02-12T09:54:57-06:00</updated><content type="html"><![CDATA[<p><a href="http://tech.puredanger.com/">Alex Miller</a> has a really interesting blog post up today. In <a href="http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/">LBQ + GC = slow</a>, he shows how <a href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/LinkedBlockingQueue.html">LinkedBlockingQueue</a> can leave a chain of references from tenured dead objects to live young objects.&nbsp; That sounds really dirty, but it actually means something to Java programmers. Something bad.</p><p>The effect here is a subtle interaction between the code and the mostly hidden, yet omnipresent, garbage collector. This interaction just happens to hit a known sore spot for the generational garbage collector. I won't spoil the ending, because I want you to read Alex's piece.</p><p>In effect, a one-line change to LinkedBlockingQueue has a dramatic effect on the garbage collector's performance. In fact, because the problem causes more full GC's, you'd be likely to observe this problem in an area completely unconnected with the queue itself.&nbsp; By leaving these refchains worming through multiple generations in the heap, the queue damages a resource needed by every other part of the application.</p><p>This is a classic common-mode dependency, and it's very hard to diagnose because it results from hidden and asynchronous coupling. <br /></p> 
]]></content></entry><entry><title>Combining here docs and blocks in Ruby</title><link href="https://michaelnygard.com/blog/2009/02/combining-here-docs-and-blocks-in-ruby/"/><id>https://michaelnygard.com/blog/2009/02/combining-here-docs-and-blocks-in-ruby/</id><published>2009-02-06T10:43:21-06:00</published><updated>2009-02-06T10:43:21-06:00</updated><content type="html"><![CDATA[<p>Like a geocache, this is another post meant to help somebody who stumbles across it in a future Google search. (Or as an external reminder for me, when I forget how I did this six months from now.)</p>

<p>I've liked here-documents since the days of shell programming. Ruby has good support for here docs with variable interpolation. For example, if I want to construct a SQL query, I can do this:</p>

<pre>
def build_query(customer_id)
  <<-STMT
    select * 
     from customer
   where id = #{customer_id}
  STMT
}
</pre>

<p>Disclaimer: Don't do this if customer_id comes from user input!</p>

<p>Recently, I wanted a way to build inserts using a matching number of column names and placeholders.</p>

<pre>
def build_query
  <<-STMT
    insert into #{table} ( #{columns()} ) values ( #{column_placeholders()} )
  STMT
end
</pre>

<p>In this case, columns and column_placeholders were both functions.</p>

<p>One oddity I ran into is the combination of here documents and block syntax. RubyDBI lets you pass a block when executing a query, the same way you would pass a block to File::open(). The block gets a &quot;statement handle&quot;, which gets cleaned up when the block completes.</p>

<pre>
  dbh.execute(query) { |sth| 
    sth.fetch() { |row|
      # do something with the row
    }
  }
</pre>

<p>Combining these two lets you write something that looks like SQL invading Ruby:</p>

<pre>
  dbh.execute(<<-STMT) { |sth|
      select distinct customer, business_unit_id, business_unit_key_name
       from problem_ticket_lz
       order by customer
    STMT
    sth.fetch { |row|
      print "#{row[1]}\t#{row[0]}\t#{row[2]}\n"
    }
  }
</pre>

<p>This looks pretty good overall, but take a look at how the block opening interacts with the here doc. The here doc appears to be line-oriented, so it always begins on the line after the &lt;&lt;-STMT token. On the other hand, the block open follows the function, so the here doc gets lexically interpolated in the middle of the block, even though it has no syntactic relation to the block. No real gripe, just an oddity.</p>
 
]]></content></entry><entry><title>Beautiful Architecture</title><link href="https://michaelnygard.com/blog/2009/02/beautiful-architecture/"/><id>https://michaelnygard.com/blog/2009/02/beautiful-architecture/</id><published>2009-02-05T15:45:56-06:00</published><updated>2009-02-05T15:45:56-06:00</updated><content type="html"><![CDATA[<p><a href="https://www.amazon.com/gp/product/059651798X?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=059651798X"><img src="/images/blog/beautiful-architecture-cover.jpg" alt="Beautiful Architecture book cover" style="float:left; margin-right:10px; margin-bottom:5px;" width="180" height="236" /></a>

O'Reilly has released &quot;<a href="http://www.amazon.com/gp/product/059651798X?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=059651798X">Beautiful Architecture</a>,&quot; a compilation of essays by software and system architects. I'm happy to announce that I have a chapter in this book. The finished book is shipping now, and available through Safari. I think the whole thing has turned out amazingly well, both instructive and interesting.</p>

<p>One of the editors, Diomidas Spinellis, has posted an <a href="http://www.spinellis.gr/blog/20090204/">excellent description and summary</a>.<br /><br /><br /><br /><br /><br /><br /></p> 
]]></content></entry><entry><title>Another Cause of TNS-12541</title><link href="https://michaelnygard.com/blog/2009/02/another-cause-of-tns-12541/"/><id>https://michaelnygard.com/blog/2009/02/another-cause-of-tns-12541/</id><published>2009-02-05T15:13:21-06:00</published><updated>2009-02-05T15:13:21-06:00</updated><content type="html"><![CDATA[<p>There are about a jillion forum posts and official pages on the web that talk about ORA-12541, the infamous &quot;TNS:No Listener&quot; error. Somewhere around 70% of them appear to be link-farmers who just scrape all the Oracle forums and mailing lists.&nbsp; Virtually all of the pages just refer back to the official definition from Oracle, which says &quot;there's no listener running on the server&quot; and tells you to log in to the server as admin and start up the listener.<br /></p><p>Not all that useful, especially if you're not the DBA.</p><p>I found a different way that you can get the same error code, even when the listener is running. Special thanks to blogger <a href="http://boomslaang.wordpress.com/">John Jacob</a>, whose <a href="http://boomslaang.wordpress.com/2008/06/30/ora-12541-tnsno-listener/">post</a> didn't quite solve my problem, but did set me on the right track.</p><p>Here's my situation. My client is a laptop connecting to the destination network through a VPN client. I'm connecting to an Oracle 10g service with 2 nodes. Tnsping reported success, the connection assistant could connect successfully, but sqlplus always reported TNS-12541 TNS:No listener.&nbsp; The listener was fine.</p><p>Turning on client side tracing, I saw that the initial connection attempt to the service VIP was successful, but that the server than sends back a packet with the hostname of a specific node to use. Here's where the problem begins.</p><p>Thanks to some quirk in the VPN configuration, I can only resolve DNS names on the VPN if they're fully qualified. The default search domain just flat doesn't work.&nbsp; So, I can resolve proddb02.example.com but not proddb02. That's the catch, because the database sends back just the host portion of the node, not the FQDN. DNS resolution fails, but sqlplus reports it as &quot;No listener&quot;, rather than saying &quot;Host not found&quot; or something useful like that.</p><p>Again, there are a jillion post and articles telling network admins how to fix the default domain search on a VPN concentrator. And, again, I'm not the network admin, either.</p><p>The best I can do as a user is work around this issue by adding the IPs of the physical DB nodes to the hosts file on my own client machine.&nbsp; Sure, some day it'll break when we re-address the DB nodes, and I will have long forgotten that I even put those addresses in C:\Windows\System32\Drivers\etc\hosts. Still, at least it works for now.<br /></p> 
]]></content></entry><entry><title>Using a custom WindowProc from Ruby</title><link href="https://michaelnygard.com/blog/2009/01/using-a-custom-windowproc-from-ruby/"/><id>https://michaelnygard.com/blog/2009/01/using-a-custom-windowproc-from-ruby/</id><published>2009-01-26T09:24:06-06:00</published><updated>2009-01-26T09:24:06-06:00</updated><content type="html"><![CDATA[<p>
This is off the beaten path today, maybe even off the whole reservation. Still, I searched for some code to do this, and couldn't find it. Maybe this will help somebody else trying to do the same thing.
</p>

<p>
I'm currently prototyping a desktop utility using Ruby and wxRuby. The combination actually makes Windows desktop programming palatable, which is a very pleasant surprise.
</p>

<p>
Part of what I'm doing involves showing messages with Snarl. I want my Ruby program to generate messages that can be clicked. Snarl is happy to tell you that your message has been clicked. It does it by sending your window a message, using whatever message code you want.
</p>

<p>
So, for example, if I want to get a <code>WM_USER</code> message back, then I create a new notification like this:
</p>

<pre>
@msg = Snarl.new('Clickable message', {:message => 'Click me, please!', :timeout => Snarl::NO_TIMEOUT, :reply_window => @win_handle, :reply_window_message => Windows::WM_USER})
</pre>

<p>
If the user clicks on my message, I'll get a <code>WM_USER</code> event delivered to my window (identified by <code>@win_handle</code>). Since I'm using wxRuby, which wraps <a href="http://www.wxwidgets.org">wxWidgets</a>, that presents a bit of a problem. Although wxWidgets allows you to subclass its default window proc, wxRuby does not. A couple of forum posts suggested using the Windows API to hook the window proc, which is what I did.
</p>

<p>
Here's the code:
</p>

<pre>
begin
  require 'rubygems'
rescue LoadError
end
</pre>

<p>I installed wxRuby as a gem, so that's boilerplate.</p>

<pre>
require 'lib/snarl'
require 'wx'
require 'windows/api'

module WindProc
  include Windows
  
  GWL_WNDPROC = -4

  WM_USER = 0x04FF

  API.auto_namespace = 'WindProc'
  API.auto_constant = true

  API.new('SetWindowLong', 'LIK', 'L', 'user32')
  API.new('CallWindowProc', 'PIIIL', 'L', 'user32')
end
</pre>

<p>
This module just gets me access to the Windows API functions <code>SetWindowLong</code> and <code>CallWindowProc</code>. <code>SetWindowLong</code> is deprecated in favor of <code>SetWindowLongPtr</code>, but I couldn't get that to load properly through the windows/api module. At some point, when you're prototyping something, you just have to decide not to solve every puzzle, especially if you can find a workable alternative.
</p>

<p>
<code>API.new()</code> constructs a Ruby object implemented by some C native code. It uses the prototype string in the second argument to translate Ruby parameters into C values when you eventually call the API function.  The conversion is done in glue code that knows how to map some Ruby primitives to C values, but it's not all that bright. In particular, there's no way to introspect on the Win32 API itself to see if you're lying to the glue code. In fact, I'm lying a little bit here.  The prototype I used---&apos;LIK&apos;---tells the API module that I'm looking for a function that takes a long, an integer, and a callback.  Strictly speaking, this should have been &apos;LIL&apos;, but I needed the glue code to convert a Ruby procedure into a C pointer.
</p>

<p>
The next section defines a subclass of Wx::Frame, the base type for all standalone windows.
</p>

<pre>
class HookedFrame < Wx::Frame
  def initialize(parent, id, title)
    super(parent, -1, title)

    evt_window_create() { |event| on_window_create(event) }
  end
</pre>

<p>I register a handler for the window create event. At this point, I'm still within the bounds of wxWidget's own event handling framework. The interesting bits happen inside the <code>on_window_create</code> method.</p>

<pre>
  def on_window_create(event)
    @old_window_proc = 0
    @my_window_proc = Win32::API::Callback.new('LIIL', 'I') { |hwnd, umsg, wparam, lparam|
      if not self.hooked_window_proc(hwnd, umsg, wparam, lparam) then
        WindProc::CallWindowProc.call(@old_window_proc, hwnd, umsg, wparam, lparam)
      end
    }
    @old_window_proc = WindProc::SetWindowLong.call(self.handle, WindProc::GWL_WNDPROC, @my_window_proc)
  end
</pre>

<p>
There are several juicy bits here. First, I'm using <code>Win32::API::Callback.new()</code> to create a callback object. How does this get used? It's a little roundabout. When I call <code>WindProc::SetWindowLong()</code>, I pass the callback object. (This is why I used &apos;LIK&apos; as the prototype string earlier.) Now, <code>WindProc::SetWindowLong()</code> isn't just a pointer to the native Windows library function. It's actually a Ruby object that wraps the library function. The API object is implemented by C code. Like the API object, the callback object is a Ruby object implemented by C code. In particular, it has an ivar that points to a Ruby procedure. Because I passed a block to <code>Callback.new()</code>, the block itself will be the procedure. Inside <code>API.call()</code>, any argument of type &quot;K&quot; gets set as the &quot;active callback&quot; and then substituted with a C function called <code>CallbackFunction</code>.  <code>CallbackFunction</code> looks up the active callback, translates parameters according to the callback's prototype, then tells Ruby to invoke the proc associated with the callback.
</p>

<p>
Whew.
</p>

<p>
So, I call <code>SetWindowLong.call()</code>, passing it the <code>Callback</code> I created with a block. <code>SetWindowLong.call()</code> ultimately callls the Windows DLL function <code>SetWindowsLong</code>, passing it the address of <code>CallbackFunction</code>. When Windows calls <code>CallbackFunction</code>, it looks up the Ruby Callback object and invokes it's procedure.
</p>

<p>
Another oddity. For some reason, although the callback object has an instance variable called <code>@function</code>, there seems to be no way to set it after construction. If you pass a block, <code>@function</code> will point to the block. If you don't, <code>@function</code> will be <code>nil</code>, with no way to set it to anything else. In other words, the API will happily let you create useless <code>Callback</code> objects.
</p>

<p>
The rest is easy. Inside my block, I just call out to a method that can be overridden by descendants of <code>HookedFrame</code>.  My test implementation just blurts out some stuff to let me know the plumbing is working.
</p>

<pre>
  def hooked_window_proc(hwnd, uMsg, wParam, lParam)
    puts "In the hook: 0x#{uMsg.to_s(16)}\t#{wParam}\t#{lParam}\n"
    if uMsg == NotifierApp::WM_USER then
      puts "That's what I've been waiting to hear:\t#{wParam}\t#{lParam}\n"
      true
    end
    false
  end
</pre>

<p>
As I reviewed this post, I realized a something else. <code>ActiveCallback</code> is static in the C glue code. That means there can only be one callback set at a time. If I called some other Windows API function with its own callback, that would overwrite the reference to my Ruby code. But, Windows would still keep calling to the same pointer as before. In other words, calling any other Windows API function that takes a callback would cause that callback to become my window proc!  Yikes!
</p>

<p>
Overall, this works, but seems like a kludge. Ironically, even as I got this working, I started getting dissatisfied with Snarl itself. I think I need more flexibility to display persistent information, rather than just alerts.
</p>
 
]]></content></entry><entry><title>OTUG Tonight</title><link href="https://michaelnygard.com/blog/2009/01/otug-tonight/"/><id>https://michaelnygard.com/blog/2009/01/otug-tonight/</id><published>2009-01-20T12:09:02-06:00</published><updated>2009-01-20T12:09:02-06:00</updated><content type="html"><![CDATA[<p>This evening, I'm speaking at <a href="http://www.otug.org/">OTUG</a>. The topic is &quot;Clouds, Grids, and Fog&quot;.</p><p>There's no denying that &quot;cloud&quot; has become a huge buzzword. It's a crossover trend, too. It's not just the CIO who is interested in cloud computing. It's the CFO and the CMO, too. (Not to mention the CSO, if there is one.)&nbsp; Underneath the buzz, though, there is something real and valuable. </p><p>I will talk about the driving trends that are leading us toward cloud computing and how it differs from grids and software-as-a-service. I'll also talk at length about the architectural implications and effects of running your software on a cloud.</p><p>If you live in the Twin Cities, I hope to see you there. <br /></p> 
]]></content></entry><entry><title>Attack of Self-Denial, 2008 Style</title><link href="https://michaelnygard.com/blog/2008/12/attack-of-self-denial-2008-style/"/><id>https://michaelnygard.com/blog/2008/12/attack-of-self-denial-2008-style/</id><published>2008-12-13T10:42:09-06:00</published><updated>2008-12-13T10:42:09-06:00</updated><content type="html"><![CDATA[<blockquote><p>&quot;Good marketing can kill your site at any time.&quot;</p><p>--Paul Lord, 2006 <br /></p></blockquote><p>I just learned of another attack of self-denial from this past week.</p><p>Many retailers are suffering this year, particularly in the brick-and-mortar space. I have heard from several, though, who say that their online performance is not suffering as much as the physical stores are. In some cases, where the brand is strong and the products are not fungible, the online channel is showing year-over-year growth.</p><p>One retailer I know was running strong, with the site near it's capacity. They fit the bill for an online success in 2008. They have a great name recognition, a very strong, global brand, and their customers love their products. This past week, their marketing group decided to &quot;take it to the next level.&quot;</p><p>They blasted an email campaign to <em>four million</em> customers.&nbsp; It had a good offer, no qualifier, and a very short expiration time---one day only.&nbsp; A short expiration like that creates a sense of urgency.&nbsp; Good marketing reaches people and induces them to act, and in that respect, the email worked. Unfortunately, when that means millions of users hitting your site, you may <a href="/blog/2007/03/selfinflicted-wounds/">run into trouble</a>.</p><p>Traffic flooded the site and knocked it offline. It took more than 6 hours to get everything functioning again.</p><p>Instead of getting an extra bump in sales, they lost six hours of holiday-season revenue. As a rule of thumb, you should assume that a peak hour of holiday sales counts for six hours of off-season sales. </p><p>There are other technological solutions to help with this kind of traffic flood. For instance, the UX group can create a static landing page for the offer. Then marketing links to that static page in their email blast. Ops can push that static page out into their cache servers, or even into their CDN's edge network. This requires some preparation for each offer, and it takes some extra preparation before the first such offer, but it's very effective. The static page absorbs the bulk of the traffic, so only customers who really want to buy get passed into the dynamic site.</p><p>Marketing can also send the email out in waves, so people receive it at different times. That spreads the traffic spike out over a few hours. (Though this doesn't work so well when you send the waves throughout the night, because customers will all see it in a couple of hours in the morning.) <br /></p><p>In really extreme cases, a portion of capacity can be carved out and devoted to handling promotional traffic. That way, if the promotion goes nuclear, at least the rest of the site is still online. Obviously, this would be more appropriate for a long-running promotion than a one-day event.</p><p>Of course, it should be obvious that all of these technological solutions depend on good communication. <br /></p><p>At a surface level, it's easy to say that this happened because marketing had no idea how close to the edge the site was already running. That's true. It's also true, however, that operations previously had no idea what the capacity was. If marketing called and asked, &quot;Can we support 4 million extra visits?&quot; the current operations group could have answered &quot;no&quot;. Previously, the answer would have been &quot;I don't know.&quot;</p><p>So, operations got better, but marketing never got re-educated. Lines of communication were never opened, or re-opened. Better communication would have helped. </p><p>In any online business, you must have close communications between marketing, UX, development, and operations. They need to regard themselves as part of one integrated team, rather than four separate teams. I've often seen development groups that view operations as a barrier to getting <em>their</em> stuff released. UX and marketing view development as the barrier to getting their ideas implemented, and so on. This dynamic evolves from the &quot;throw it over the wall&quot; approach, and it can only result in finger-pointing and recriminations.<br /></p><p>I'd bet there's a lot of finger-pointing going on in that retailer's hallways this weekend.</p> 
]]></content></entry><entry><title>(Human | Pattern) Languages, part 2</title><link href="https://michaelnygard.com/blog/2008/12/human-pattern-languages-part-2/"/><id>https://michaelnygard.com/blog/2008/12/human-pattern-languages-part-2/</id><published>2008-12-08T00:56:17-06:00</published><updated>2008-12-08T00:56:17-06:00</updated><content type="html"><![CDATA[<blockquote><em>At the conclusion of the modulating bridge, we expect to be in the contrasting key of C minor. Instead, the bridge concludes in the distantly related key of F sharp major... Instead of resolving to the tonic, the cadence concludes with two isolated E pitches. They are completely ambiguous. They could belong to E minor, the tonic for this movement. They could be part of E major, which we've just heard peeking out from behind the minor mode curtains. [He] doesn't resolve them into a definite key until the beginning of the third movement, characteristically labeled a &quot;Scherzo&quot;.</em></blockquote>

<p>In my last post, I lamented the missed opportunity we had to create a true pattern language about software. Perhaps calling it a missed opportunity is too pessimistic. Bear with me on a bit of a tangent. I promise it comes back around in the end.</p>

<p>The example text above is an amalgam of a lecture series I've been listening to. I'm a big fan of <a href="http://www.teach12.com/">The Teaching Company</a> and their courses. In particular, I've been learning about the meaning and structure of classical, baroque, romantic, and modern music from Professor <a href="http://www.teach12.com/storex/professor.aspx?id=3">Robert Greenberg</a>.<a href="#hpl2_1" class="footnote">1</a> The sample I used here is from a series on <a href="http://www.teach12.com/ttcx/coursedesclong2.aspx?pc=Professor&amp;cid=7250">Beethoven's piano sonatas</a>. This isn't an actual quote, but a condensation of statements from one of the lectures. I'm not going to go into all the music theory behind this, but it is interesting.<a href="#hpl2_2" class="footnote">2</a></p>

<p>There are two things I want you to observe about the sample text. First, it's <em>loaded</em> with jargon. It has to be! You'd exhaust the conversational possibilities about the best use of a D-sharp pretty quickly. Instead, you'll talk about structures, tonalities, relationships between that D-sharp and other pitches. (D-sharp played together with a C? Very different from a quick sequence of D-sharp, E, D-sharp, C.) You can be sure that composers don't think in terms of individual notes. A D-sharp by itself doesn't mean anything. It only acquires meaning by its relation to other pitches. Hence all that stuff about keys---tonic, distantly related, contrasting. &quot;Key&quot; is a construct for discussing whole collections of pitches in a kind of shorthand. To a musician, there's a world of difference between G major and A flat minor, even though the basic pitch (the <em>tonic</em>) is only one half-step apart.</p>

<p>Also notice that the text addresses some structural features. The purpose and structure of a modulating bridge is pretty well understood, at least in certain circles. The notion that you can have an &quot;expected&quot; key certainly implies that there are rules for a sonata. In fact, the term &quot;sonata&quot; itself means some fairly specific things<a href="#hpl2_3" class="footnote">3</a>... although to know whether we're talking about &quot;a sonata&quot; or &quot;a movement in sonata form&quot; requires some additional context.</p>

<p>In fact, this paragraph is all about context. It exists in the context of late Classical, early Romantic era music, specifically the music of Beethoven. In the Classical era, musical forms---such as sonata form---pretty much dictates the structure of the music. The number of movements, their relationships to each other, their keys, and even their tempos were well understood. A contemporary listener had every reason to expect that a first movement would be fast and bright, and if the first movement was in C major, then the second, slower movement would be a minuet and trio in G major.</p>

<p>Music and music theory have evolved over the last thousand-odd years. We have a vocabulary---the potentially off-putting jargon of the field. We have nesting, interrelating contexts. Large scale patterns (a piano sonata) create context for medium scale patterns (the first movement &quot;allegretto&quot;) which in turn, create context for the medium and small scale patterns (the first theme in the allegretto consists of an ABA'BA phrasing, in which the opening theme sequences a motive upward over octaves.)&nbsp; We even have the ability to talk about non sequiturs---like the modulating bridge above---where deliberate violation of the pattern language is done for effect.<a href="#hpl2_4" class="footnote">4</a></p>

<p>What is all this stuff if it isn't a pattern language?</p>

<p>We can take a few lessons, then, from the language of music.</p>

<p>The first lesson is this: give it time. Musical language has evolved over a long time. It has grown and been pruned back over centuries. New terms are invented as needed to describe new answers to a context. In turn, these new terms create fresh contexts to be exploited with yet other inventions.</p>

<p>Second, any such language <em>must</em> be able to assimilate change. Nothing is lost, even amidst the most radical revolutions. When the Twentieth Century modernists rejected the tonal system, they could only reject the structures and strictures of that language. They couldn't destroy the language itself. Phish plays fugues in concert... they just play them with electric guitars instead of harpsichords. There are Baroque orchestras today. They play in the same concert halls as the Pops and Philharmonics. The homophonic texture of plain chant still exists, and so do the once-heretical polyphony and church-sanctioned monophony. Nothing is lost, but new things can be encompassed and incorporated.</p>

<p>And, mainframes still exist with their COBOL programs, together with distributed object systems, message passing, and web services. The Singleton and Visitor patterns will never truly go away, any more than batch programming will disappear.</p>

<p>Third, we must continue to look at the relationships between different parts of our nascent pattern language. Just as individual objects aren't very interesting, isolated patterns are less interesting than the ways they can interact with each other.</p>

<p>I believe that the true language of software has as much to do with programming languages as the language of music has to do with notes. So, instead of missed opportunity, let us say instead that we are just beginning to discover our true language.</p>

<hr>

<p><a name="hpl2_1">1.</a> Professor Greenberg is a delightful traveling companion. He's witty, knowledgeable and has a way of teaching complex subjects without ever being condescending. He also sounds remarkably like Penn Jillette.</p>

<p><a name="hpl2_2">2.</a> The main reason is that I would surely get it wrong in some details and risk losing the main point of my post here.</p>

<p><a name="hpl2_3">3.</a> And here we see yet another of the complexities of language. The word &quot;sonata&quot; refers, at different times, to a three movement concert work, a single movement in a characteristic structure, a four movement concert work, and in Beethoven's case, to a couple of great fantasias that he declares to be sonatas simply because he says so.</p>

<p><a name="hpl2_4">4.</a> For examples ad nauseum, see Richard Wagner and the &quot;abortive gesture&quot;. </p> 
]]></content></entry><entry><title>(Human | Pattern) Languages</title><link href="https://michaelnygard.com/blog/2008/12/human-pattern-languages/"/><id>https://michaelnygard.com/blog/2008/12/human-pattern-languages/</id><published>2008-12-08T00:19:36-06:00</published><updated>2008-12-08T00:19:36-06:00</updated><content type="html"><![CDATA[<p>We missed the point when we adopted &quot;patterns&quot; in the software world. Instead of an organic whole, we got a bag of tricks.</p>

<p>The commonly accepted definition of a pattern is &quot;a solution to a problem in a context.&quot; This is true, but limiting. This definition loses an essential characteristic of patterns: Patterns relate to other patterns.</p>

<p>We talk about the context of a problem. &quot;Context&quot; is a mental shorthand. If we unpack the context it means many things: constraints, capabilities, style, requirements, and so on. We sometimes mislead ourselves by using the fairly fuzzy, abstract term &quot;context&quot; as a mental handle on a whole variety of very concrete issues. Context includes stated constraints like the functional requirements, along with unstated constraints like, &quot;The computation should complete <em>before</em> the heat death of the universe.&quot; It includes other forces like, &quot;This program is written in C#, so the solution to this problem should be in the same language or a closely related one.&quot; It should not require a supercooled quantum computer, for example.</p>

<p>Where does the context for a small-scale pattern originate?<a class="footnote" href="#1">1</a> Context does not arise ex nihilio. No, the context for a small-scale pattern is created by larger patterns. Large grained patterns create the fabric of forces that we call the context for smaller patterns. In turn, smaller patterns fit into this fabric and, by their existence, they change it. Thus, the small scale patterns create feedback that can either resolve or exacerbate tensions inherent in the larger patterns.</p>

<p>Solutions that respect their context fit better with the rest of the organic whole. It would be strange to be reading some Java code, built into layered architecture with a relational database for storage, then suddenly find one component that has its own LISP interpreter and some functional code. With all respect to &quot;polyglot programming&quot;, there'd better be a strong motivation for such an odd inclusion. It would be a discontinuity... in other words, it doesn't fit the context I described. That context---the layered architecture, the OO language, relational database---was created by other parts of the system.</p>

<p>If, on the other hand, the system was built as a blackboard architecture, using LISP as glue code over intelligent agents acting asynchronously, then it wouldn't be at all odd to find some recursive lambda expressions. In that context, they fit naturally and the Java code would be an oddity.</p>

<p>This interrelation across scale knits patterns together into a <em>pattern language</em>. By and large, what we have today is a growing group of proper nouns. Please don't get me wrong, the nouns themselves have use. It's very helpful to say &quot;you want a Null Object there,&quot; and be understood. That vocabulary and the compression it provides is really important.</p>

<p>But we shouldn't mistake a group of nouns for a real pattern language. A language is more than just its nouns. A language also implies ways of connecting statements sensibly. It has idioms and semantics and semiotics.<a class="footnote" href="#2">2</a> In a language, you can have dialog and argumentation.&nbsp; Imagine a dialog in patterns as they exist today:</p>

<p>&quot;Pipes and filters.&quot;</p>

<p>&quot;Observer?&quot;</p>

<p>&quot;Chain of Responsibility!&quot;</p>

<p>You might be able to make a comedy sketch out of that, but not much more. We cannot construct meaningful dialogs about patterns at all scales.</p>

<p>What we have are fragments of what might become a pattern language. GoF, the PLoPD books, the PoSA books... these are like a few charted territories on an unmapped continent. We don't yet have the language that would even let us relate these works together, let alone relating them to everything else.</p>

<p>Everything else?&nbsp; Well, yes. By and large, patterns today are an outgrowth of the object-oriented programming community.&nbsp; I contend, however, that &quot;object-oriented&quot; <em>is</em> a pattern! It's a large-scale pattern that creates really significant context for all the other patterns that can work within it. Solutions that work within the &quot;object-oriented&quot; context make no sense in an actor-oriented context, or a functional context, or a procedural context, and so on. Each of these other large-scale patterns admit different solutions to similar problems: persistence, user interaction, and system integration, to name a few. I can imagine a pattern called &quot;Event Driven&quot; that would work very well with &quot;Object oriented&quot;, &quot;Functional&quot;, and &quot;Actor Oriented&quot;, but somewhat less well with &quot;Procedural programming&quot;, and contradict utterly with &quot;Batch Processing&quot;. (Though there might be a link between them called &quot;Buffer file&quot; or something like that.)</p>

<p>That's the piece that we missed. We don't have a pattern language yet. We're not even close.</p>

<hr>

<p><a name="1">1.</a> By &quot;large&quot; and &quot;small&quot;, I don't mean to imply that patterns simply nest hierarchically. It's more complex and subtle than that. When we do have a real pattern language, we'll find that there are medium-grained patterns that work together with several, but not all, of the large ones. Likewise, we'll find small-scale patterns that make medium sized ones more or less practical. It's not a decision tree or a heuristic.</p>

<p><a name="2">2.</a> That's what keeps, &quot;Fill the idea with blue&quot; from being a meaningful sentence. All the words work, and they're even the right part of speech, yet the sentence as a whole doesn't fit together. </p> 
]]></content></entry><entry><title>Connection Pools and Engset</title><link href="https://michaelnygard.com/blog/2008/12/connection-pools-and-engset/"/><id>https://michaelnygard.com/blog/2008/12/connection-pools-and-engset/</id><published>2008-12-03T09:35:25-06:00</published><updated>2008-12-03T09:35:25-06:00</updated><content type="html"><![CDATA[<p>In my <a href="/blog/2008/11/thread_pools_and_erlang_models.html">last post</a>, I talked about using Erlang models to size the front end of a system. By using some fundamental capacity models that are almost a century old, you can estimate the number of request handling threads you need for a given traffic load and request duration.</p>

<h2>Inside the Box</h2>

<p>It gets tricky, though, when you start to consider what happens inside the server itself. Processing the request usually involves some kind of database interaction with a connection pool. (There are many ways to avoid database calls, or at least minimize the damage they cause. I'll address some of these in a future post, but you can also check out <a href="/blog/2007/11/two-ways-to-boost-your-flaggin/">Two Ways to Boost Your Flagging Web Site</a> for starters.) Database calls act like a kind of &quot;interior&quot; request that can be considered to have its own probability of queuing.</p>

<p><img width="473" height="266" style="border: none;" title="Exterior call to server becomes an &quot;interior&quot; call to a database." alt="Exterior call to server becomes an &quot;interior&quot; call to a database." src="/images/blog/engset/interior_call_structure.png" /> </p>

<p>Because this interior call can block, we have to consider what effects it will have on the duration of the exterior call. In particular, the exterior call must take at least the sum of the blocking time plus the processing time for the interior call.</p>

<p>At this point, we need to make a few assumptions about the connection pool. First, the connection pool is finite. Every connection pool should have a ceiling. If nothing else, the database server can only handle a finite number of connections. Second, I'm going to assume that the pool blocks when exhausted. That is, calling threads that can't get a connection right away will happily wait forever rather than abandoning the request. This is a simplifying assumption that I need for the math to work out. It's not a good configuration in practice!</p>

<p>With these assumption in place, I can predict the probability of blocking within the interior call. It's a formula closely related to the Erlang model from my last post, but with a twist. The Erlang models assume an essentially infinite pool of requestors. For this interior call, though, the pool of requestors is quite finite: it's the number of request handling threads for the exterior calls. Once all of those threads are busy, there aren't any left to generate more traffic on the interior call!</p>

<p>The formula to compute the blocking probability with a finite number of sources is the <a href="http://en.wikipedia.org/wiki/Engset_calculation">Engset formula</a>. Like the Erlang models, Engset originated in the world of telephony. It's useful for predicting the outbound capacity needed on a private branch exchange (PBX), because the number of possible callers is known. In our case, the request handling threads are the callers and the connection pool is the PBX.</p>

<h2>Practical Example</h2>

<p>Using our 1,000,000 page views per hour from last time, Table 1 shows the Engset table for various numbers of connections in the pool. This assumes that the application server has a maximum of 40 request handling threads. This also supposes that the database processing time uses 200 milliseconds of the 250 milliseconds we measured for the exterior call.</p>

<table>
<tr><th>N</th><th>Engset(N,A,S)</th></tr>
<tr><td>0</td><td>100.00000%</td></tr>
<tr><td>1</td><td>98.23183%</td></tr>
<tr><td>2</td><td>96.37740%</td></tr>
<tr><td>3</td><td>94.43061%</td></tr>
<tr><td>4</td><td>92.38485%</td></tr>
<tr><td>5</td><td>90.23293%</td></tr>
<tr><td>6</td><td>87.96709%</td></tr>
<tr><td>7</td><td>85.57891%</td></tr>
<tr><td>8</td><td>83.05934%</td></tr>
<tr><td>9</td><td>80.39867%</td></tr>
<tr><td>10</td><td>77.58656%</td></tr>
<tr><td>11</td><td>74.61210%</td></tr>
<tr><td>12</td><td>71.46397%</td></tr>
<tr><td>13</td><td>68.13065%</td></tr>
<tr><td>14</td><td>64.60087%</td></tr>
<tr><td>15</td><td>60.86421%</td></tr>
<tr><td>16</td><td>56.91211%</td></tr>
<tr><td>17</td><td>52.73932%</td></tr>
<tr><td>18</td><td>48.34604%</td></tr>
<tr><td>19</td><td>43.74105%</td></tr>
<tr><td>20</td><td>38.94585%</td></tr>
<tr><td>21</td><td>34.00023%</td></tr>
<tr><td>22</td><td>28.96875%</td></tr>
<tr><td>23</td><td>23.94730%</td></tr>
<tr><td>24</td><td>19.06718%</td></tr>
<tr><td>25</td><td>14.49235%</td></tr>
<tr><td>26</td><td>10.40427%</td></tr>
<tr><td>27</td><td>6.97050%</td></tr>
<tr><td>28</td><td>4.30152%</td></tr>
<tr><td>29</td><td>2.41250%</td></tr>
<tr><td>30</td><td>1.21368%</td></tr>
<tr><td>31</td><td>0.54082%</td></tr>
<tr><td>32</td><td>0.21081%</td></tr>
<tr><td>33</td><td>0.07093%</td></tr>
<tr><td>34</td><td>0.02028%</td></tr>
<tr><td>35</td><td>0.00483%</td></tr>
<tr><td>36</td><td>0.00093%</td></tr>
<tr><td>37</td><td>0.00014%</td></tr>
<tr><td>38</td><td>0.00002%</td></tr>
<tr><td>39</td><td>0.00000%</td></tr>
<tr><td>40</td><td>0.00000%</td></tr>
</table>

<p>Notice that when we get to 18 connections in the pool, the probability of blocking drops below 50%.&nbsp; Also, notice how sharply the probability of blocking drops off around 23 to 31 connections in the pool. This is a decidedly nonlinear effect!</p>

<p>From this table, it's clear that even though there are 40 request handling threads that could call into this pool, there's not much point in having more than 30 connections in the pool. At 30 connections, the probability of blocking is already less than 1%, meaning that the queuing time is only going to add a few milliseconds to the average request.</p>

<p>Why do we care? Why not just crank up the connection pool size to 40? After all, if we did, then no request could ever block waiting for a connection. That would minimize latency, wouldn't it?</p>

<p>Yes, it would, but at a cost. Increasing the number of connections to the database by a third means more memory and CPU time on the database just managing those connections, even if they're idle. If you've got two app servers, then the database probably won't notice an extra 10 connections. Suppose you scale out at the app tier, though, and you now have 50 or 60 app servers. You'd better believe that the DB will notice an extra 500 to 600 connections. They'll affect memory needs, CPU utilization, and your ability to fail over correctly when a database node goes down.</p>

<h2>Feedback and Coupling</h2>

<p>There's a strong coupling between the total request duration in the interior call and the request duration for the exterior call. If we assume that every request must go through the database call, then the exterior response time must be strictly greater than the interior blocking time plus the interior processing time.</p>

<p>In practice, it actually gets a little worse than that, as this causal loop diagram illustrates.<br /></p><p>&nbsp;<img width="634" height="233" border="0" src="/images/blog/engset/interior_exterior_coupling.png" style="float: none; border: none;" alt="Time dependencies between the interior call and the exterior call." title="Time dependencies between the interior call and the exterior call." /></p>

<p>It reads like this: &quot;As the interior call blocking time increases, the exterior call duration increase. As the interior call blocking increases, the exterior call duration time increases.&quot; This type of representation helps clarify relations between the different layers. It's very often the case that you'll find feedback loops this way. Any time you do find a feedback loop, it means that slowdowns will produce increasing slowdowns. Blocking begets blocking, quickly resulting in a site hang.</p>

<h2>Conclusions</h2>

<p>Queues are like timing dots. Once you start seeing them, you'll never be able to stop. You might even start to think that your entire server farm looks like one vast, interconnected set of queues.</p>

<p>That's because it is.</p>

<p>People use database connection pools because creating new connections is very slow. Tuning your database connection pool size, however, is all about optimizing the cost of queueing against the cost of extra connections. Each connection consumes resources on the database server and in the application server. Striking the right balance starts by identifying the required exterior response time, then sizing the connection pool---or changing the architecture---so the interior blocking time doesn't break the SLA.</p>

<p>For much, much more on the topic of capacity modeling and analysis, I definitely recommend Neil Gunther's website, <a href="http://perfdynamics.blogspot.com/">Performance Agora</a>. <a type="amzn" search="Neil Gunther" category="books">His books</a> are also a great---and very practical---way to start applying performance and capacity management.</p>
 
]]></content></entry><entry><title>Thread Pools and Erlang Models</title><link href="https://michaelnygard.com/blog/2008/11/thread-pools-and-erlang-models/"/><id>https://michaelnygard.com/blog/2008/11/thread-pools-and-erlang-models/</id><published>2008-11-30T20:32:12-06:00</published><updated>2008-11-30T20:32:12-06:00</updated><content type="html"><![CDATA[<h2>Sizing, Danish Style</h2>

<p>Folks in telecommunications and operations research have used <a href="http://en.wikipedia.org/wiki/Erlang_unit">Erlang</a> models for almost a century. <a href="http://en.wikipedia.org/wiki/Agner_Krarup_Erlang">A. K. Erlang</a>, a Danish telephone engineer, developed these models to help plan the capacity of the phone network and predict the grade of service that could be guaranteed, given some basic metrics about call volume and duration. Telephone networks are expensive to deploy, particularly when upgrading your trunk lines involves digging up large portions of rocky Danish ground or running cables under the North Sea.</p>

<p>The <a href="http://en.wikipedia.org/wiki/Erlang-B">Erlang-B</a> formula predicts the probability that an incoming call cannot be serviced, based on the call arrival rate, average call time, and number of lines available.&nbsp; <a href="http://en.wikipedia.org/wiki/Erlang-C">Erlang-C</a> is similar, but allows for calls to be queued while waiting for service. It predicts the probability that a call will be queued. It can also show when calls will never be serviced, because the rate of arriving calls exceeds the system's total capacity to serve them.</p>

<p>Erlang models are widely used in telecomm, including GPRS network sizing, trunk line sizing, call center staffing models, and other capacity planning arenas where request arrival is apparently random. In fact, you can use it to predict the capacity and wait time at a restaurant, bank branch, or theme park, too.</p>

<p>It should be pretty obvious that Erlang models are widely applicable in computer performance analysis, too. There's a rich body of literature on this subject that goes back to the dawn of the mainframe. Erlang models are the foundation of most capacity management groups. I'm not even going to scratch the surface here, except to show how some back-of-the-envelope calculations can help you save millions of dollars.</p>

<h2>One Million Page Views</h2>

<p>In my case, I wanted to look at thread pool sizing. Suppose you have an even 1,000,000 requests per hour to handle. This implies an arrival rate (or lambda) of 0.27777... requests per millisecond. (<a href="http://en.wikipedia.org/wiki/Erlang_unit">Erlang units</a> are dimensionless, but you need to start with the same units of time, whether it's hours, days, or milliseconds.) I'm going to assume for the moment that the system is pretty fast, so it handles a request in 250 milliseconds, on average.</p><p>(Please note that there are many assumptions underneath simply statements like &quot;on average&quot;. For the moment, I'll pretend that request processing time follows a <a href="http://mathworld.wolfram.com/NormalDistribution.html">normal distribution</a>, even though any modern system is more likely to be <a href="http://mathworld.wolfram.com/BimodalDistribution.html">bimodal</a>.)</p>

<p>Table 1 shows a portion of the Erlang-C table for these parameters. Feel free to double-check my work with <a href="/downloads/ErlangTables.xls" target="_blank">this spreadsheet</a> or this <a href="/downloads/erlang2.c" target="_blank">short C program</a> to compute the Erlang-B and Erlang-C values for various numbers of threads. (Thanks to Kenneth J. Christensen for the original program. I can only claim credit for the extra &quot;for&quot; loop.)</p>

<p class="table_caption">Table 1. Erlang-C values at 250 ms / request</p>
<table>
<tr><th>N</th><th>Pr_Queue (Erlang-C)</th></tr>
<tr><td>67</td><td>undef</td></tr>
<tr><td>68</td><td>undef</td></tr>
<tr><td>69</td><td>undef</td></tr>
<tr><td>70</td><td>0.921417281</td></tr>
<tr><td>71</td><td>0.791698369</td></tr>
<tr><td>72</td><td>0.676255938</td></tr>
<tr><td>73</td><td>0.574128540</td></tr>
<tr><td>74</td><td>0.484342834</td></tr>
<tr><td>75</td><td>0.405921606</td></tr>
<tr><td>76</td><td>0.337892350</td></tr>
<tr><td>77</td><td>0.279296163</td></tr>
<tr><td>78</td><td>0.229196685</td></tr>

<tr><td>79</td><td>0.186688788</td></tr>
<tr><td>80</td><td>0.150906701</td></tr>
<tr><td>81</td><td>0.121031288</td></tr>
<tr><td>82</td><td>0.096296202</td></tr>
<tr><td>83</td><td>0.075992736</td></tr>
<tr><td>84</td><td>0.059473196</td></tr>
<tr><td>85</td><td>0.046152756</td></tr>
<tr><td>86</td><td>0.035509802</td></tr>
<tr><td>87</td><td>0.027084849</td></tr>
<tr><td>88</td><td>0.020478191</td></tr>
<tr><td>89</td><td>0.015346497</td></tr>
<tr><td>90</td><td>0.011398581</td></tr>
<tr><td>91</td><td>0.008390600</td></tr>
<tr><td>92</td><td>0.006120940</td></tr>
<tr><td>93</td><td>0.004424999</td></tr>
<tr><td>94</td><td>0.003170077</td></tr>
<tr><td>95</td><td>0.002250524</td></tr>
<tr><td>96</td><td>0.001583268</td></tr>
<tr><td>97</td><td>0.001103786</td></tr>
<tr><td>98</td><td>0.000762573</td></tr>
<tr><td>99</td><td>0.000522098</td></tr>
</table>

<p>From Table 1, I can immediately see that anything less than 70 threads will never keep up. With less than 70 threads, the queue of unprocessed requests will grow without bound. I need at least 91 threads to get below a 1% chance that a request will be delayed by queueing.</p>

<h2>Performance and Capacity</h2>
<p>Now, what happens if the average request processing time goes up by 100 milliseconds on those same million requests? Adjusting the parameters, I get Table 2.</p>

<p class="table_caption">Table 2. Erlang-C values at 350 ms / request</p>
<table>
<tr><th>N</th><th>Pr_Queue (Erlang-C)</th></tr>
<tr><td>96</td><td>undef</td></tr>
<tr><td>97</td><td>undef</td></tr>
<tr><td>98</td><td>0.907100356</td></tr>
<tr><td>99</td><td>0.797290966</td></tr>
<tr><td>100</td><td>0.697789489</td></tr>
<tr><td>101</td><td>0.608014385</td></tr>
<tr><td>102</td><td>0.527376532</td></tr>
<tr><td>103</td><td>0.455282634</td></tr>
<tr><td>104</td><td>0.391138874</td></tr>
<tr><td>105</td><td>0.334354749</td></tr>
<tr><td>106</td><td>0.284347016</td></tr>
<tr><td>107</td><td>0.240543652</td></tr>
<tr><td>108</td><td>0.202387733</td></tr>
<tr><td>109</td><td>0.169341130</td></tr>
<tr><td>110</td><td>0.140887936</td></tr>
<tr><td>111</td><td>0.116537521</td></tr>
<tr><td>112</td><td>0.095827141</td></tr>
<tr><td>113</td><td>0.078324041</td></tr>
<tr><td>114</td><td>0.063626999</td></tr>
<tr><td>115</td><td>0.051367297</td></tr>
<tr><td>116</td><td>0.041209109</td></tr>
<tr><td>117</td><td>0.032849334</td></tr>
<tr><td>118</td><td>0.026016901</td></tr>
<tr><td>119</td><td>0.020471625</td></tr>
<tr><td>120</td><td>0.016002658</td></tr>
<tr><td>121</td><td>0.012426630</td></tr>
<tr><td>122</td><td>0.009585560</td></tr>
<tr><td>123</td><td>0.007344611</td></tr>
<tr><td>124</td><td>0.005589775</td></tr>
<tr><td>125</td><td>0.004225555</td></tr>
</table>

<p>Now we need a minimum of 99 threads before we can even expect to keep up and we need 122 threads to get down under that 1% queuing threshold.</p>

<p>On the other hand, what about increasing performance by 100 millseconds per request? I'll let you run the calculator for that, but it looks to me like we need between 42 and 59 threads to meet the same thresholds.</p>

<p>That swing, from 150 to 350 milliseconds per request makes a huge difference in the number of concurrent threads your system must support to handle a million requests per hour---almost a factor of 3 times. Would you be willing to triple your hardware for the same request volume? Next time anyone says that "CPU is cheap", fold your arms and tell them "Erlang would not approve." On the flip side, it might be worth spending some administrator time on performance tuning to bring down your average page latency. Or maybe some programmer time to integrate <a href="http://www.danga.com/memcached/">memcached</a> so every single page doesn't have to trudge all the way to the database.</p>

<h2>Summary and Extension</h2>
<p>Obviously, there's a lot more to performance analysis for web servers than this. Over time, I'll be mixing more analytic pieces with the pragmatic, hands-on posts that I usually make. It'll take some time. For one thing, I have to go back and learn about <a href="http://mathworld.wolfram.com/StochasticProcess.html">stochastic process</a> and <a href="http://mathworld.wolfram.com/MarkovChain.html">Markov chains</a>. Pattern recognition and signal processing I've got. Advanced probability and statistics I don't got.</p>

<p>In fact, I'll offer a free copy of <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">Release It</a> to the first commenter who can show me how to derive an Erlang-like model that accounts for a) garbage collection times (bimodal processing time distribution), b) multiple coupled wait states during processing, c) non-equilibrium system states, and d) processing time that varies as a function of system utilization.</p> 
]]></content></entry><entry><title>Constraint, Chaos, Collapse</title><link href="https://michaelnygard.com/blog/2008/11/constraint-chaos-collapse/"/><id>https://michaelnygard.com/blog/2008/11/constraint-chaos-collapse/</id><published>2008-11-16T09:23:47-06:00</published><updated>2008-11-16T09:23:47-06:00</updated><content type="html"><![CDATA[<p><a href="http://pmuellr.blogspot.com" target="_blank">Patrick Muellr</a> has an interesting post about being <a href="http://pmuellr.blogspot.com/2008/11/brainwashed.html">brainwashed</a> into believing that the outrageous is normal. It's a good read. (Hat tip to Reddit, whence many good things.) As often happens, I wrote such a long comment to his post that I felt it worthwhile to repost here.</p><p>My comment revolves around this chart of the Dow Jones Industrial Average over the last eighty years. (For the record, I'm not disputing anything about the rest of Patrick's post. In fact, I agree with most of what he says. This chart and my comments aren't central to his discussion about web development.) Some of you know that I've worked in finance before, and most of you know I have an interest in dynamics and complex systems. It's been an <em>interesting</em> year.</p><p>Here's a snapshot of the chart in question. It's from <a target="_blank" href="http://finance.yahoo.com">Yahoo! Finance</a>, and the image links to the live chart. <br /></p><p><a href="http://finance.yahoo.com/echarts?s=%5EDJI#chart2:symbol=%5Edji;range=my;indicator=volume;charttype=line;crosshair=on;ohlcvalues=0;logscale=off;source=undefined"><img width="240" height="131" border="0" src="https://farm4.static.flickr.com/3048/3025398461_1de783eb1f_m.jpg" /></a> <br /><br />Most of the chart looks like an exponential, which suggests the effect of compound growth. In a functioning capital-based system you'd expect exactly that. Capital invested produces more capital. Any time an output is also a required input, you get exponential growth. One of Patrick's <a target="_blank" href="http://pmuellr.blogspot.com/2008/11/brainwashed.html?showComment=1226556240000#c8274809948738722601">other commenters</a> points out that it looks almost linear when plotted on a logarithmic scale... a dead giveaway of an exponential.<br /><br />No real system can produce infinite growth. Instead, they always hit a constraint. That could be a physical limitation on the available inputs. It could be a limit on the throughput of the system itself. In a sense, it almost doesn't matter what the constraint itself happens to be. Rather, you should assume that a constraint exists.<br /><br />In systems with a chaotic tendency, the system doesn't slow down at all when approaching the constraint. In fact, it may be increasing at it's greatest rate just before the constraint clamps down hardest. In such cases, you'll either see a catastrophic collapse or a chaotic fluctuation.<br /></p><p>I don't know what the true constraint was in the financial system. Plenty of other people believe they know, and I'm happy to let them believe what they like. Just from looking at the chart, though, you could make a strong case that we really hit the constraint in 1999 and the rest has been chaos since then. <br /></p> 
]]></content></entry><entry><title>Licensing for Windows on EC2</title><link href="https://michaelnygard.com/blog/2008/10/licensing-for-windows-on-ec2/"/><id>https://michaelnygard.com/blog/2008/10/licensing-for-windows-on-ec2/</id><published>2008-10-26T07:11:53-05:00</published><updated>2008-10-26T07:11:53-05:00</updated><content type="html"><![CDATA[<p>One thing I noticed when I <a href="/blog/2008/10/windows-on-ec2-from-a-mac/">fired</a> up my first <a target="_blank" href="http://aws.amazon.com/windows">Windows instances on EC2</a> was that Windows never asked me for a license key.&nbsp; From examining the registry, it appears that a valid license key is installed at boot time.&nbsp; On two instances of image ami-b53cd8dc (ec2-public-windows-images/Server2003r2-i386-anon-v1.01 for i386) I got exactly the same key.</p><p>Likewise, on two different instances of ami-7b2bcf12 (ec2-public-windows-images/Server2003r2-x86_64-anon-v1.00 or x64), I got the same license key--though not the same key as the i386 image.</p><p>This tells me that the license key is probably baked into the image. It's also possible that these particular license keys are unique to my account. If someone else wants to compare keys, it'd be an interesting experiment.</p><p>Either way, the extra 2.5 cents per hour on the small instance must go to Microsoft to pay for license rental. <br /></p><p>&nbsp;</p> 
]]></content></entry><entry><title>Windows on EC2, from a Mac</title><link href="https://michaelnygard.com/blog/2008/10/windows-on-ec2-from-a-mac/"/><id>https://michaelnygard.com/blog/2008/10/windows-on-ec2-from-a-mac/</id><published>2008-10-23T13:54:30-05:00</published><updated>2008-10-23T13:54:30-05:00</updated><content type="html"><![CDATA[<p>It may be a bit perverse, but I wanted to hit a Windows EC2 instance from my Mac. After a little hitch getting started, I got it to work. There are a few quirks about accessing Windows instances, though.</p>

<p>First off, SSH is not enabled by default. You'll need to use remote desktop to access your instance. Remote desktop uses port 3389, so the first step is to create a new security group for Windows desktop access</p>

<pre>
$ ec2-add-group windows -d 'Windows remote desktop access'
GROUP    windows    Windows remote desktop access
</pre>

<p>Then, allow access to port 3389 from your desired origin. I'm allowing it from anywhere, which isn't a great idea, but I'm on the road a lot. I never know what the hotel's network origin will be.</p>

<pre>
$ ec2-authorize windows -p 3389 -P tcp
GROUP        windows    
PERMISSION        windows    ALLOWS    tcp    3389    3389    FROM    CIDR    0.0.0.0/0
</pre>

<p>Obviously, you could add that permission to any existing group that you already use.</p>

<p>There's a bit of a song and dance to log in. Where Linux instances typically use SSH with public-key authentication, Windows server requires a typed password. Amazon has come up with a reasonable, but slightly convoluted, way to extract a randomized password.</p>

<p>You will need to start your instance in the new security group and with a keypair. The docs could be a little clearer, in that here you're providing the <em>name</em> of the keypair as it was registered with EC2. The first few times I tried this, I was giving it the path of the file containing the keypair, which doesn't work.</p>

<pre>
$ ec2-describe-keypairs
KEYPAIR    devkeypair    02:10:65:9e:51:73:7e:93:bd:30:e2:5d:91:03:d5:e1:d4:0e:c0:f4
$ ec2-run-instances ami-782bcf11 -g windows -k devkeypair
RESERVATION    r-82429ceb    001356815600    windows
INSTANCE    i-f172db98    ami-782bcf11            pending    devkeypair    0        m1.small    2008-10-23T20:01:36+0000    us-east-1a            windows
</pre>

<p>After all that, and waiting through a Windows boot cycle, you can access the Windows desktop through RDP.</p>

<p>What's that? You don't have an RDP client, because you're a Mac user? I like <a target="_blank" href="http://cord.sourceforge.net/">CoRD</a> for that. I also saw a lot of references to <a href="http://rdesktop.darwinports.com/">rdesktop</a>, which is available through <a target="_blank" href="http://darwinports.com/">Darwin Ports</a>. (For today, I wasn't prepared to install Ports just to try out the Windows EC2 instance!)</p><p>Extract the public IP address of your instance:</p>

<pre>
$ ec2-describe-instances
RESERVATION    r-82429ceb    001356815600    windows
INSTANCE    i-f172db98    ami-782bcf11    ec2-75-101-252-238.compute-1.amazonaws.com    domU-12-31-39-02-48-31.compute-1.internal    running    devkeypair    0        m1.small    2008-10-23T20:01:36+0000    us-east-1a        windows
</pre>

<p>Fire up CoRD and paste the IP address into &quot;Quick Connect&quot;. </p>

<p><img height="219" width="420" style="border: 0px none ; float: none;" src="/images/blog/win_ec2/windows_ec2_login.png" /></p>

<p>Well, now what? Obviously, you'll use &quot;Administrator&quot; as the username, but what's the password? There's a new command in the latest release of ec2-api-tools called &quot;ec2-get-password&quot;.</p>

<pre>
$ ec2-get-password i-f172db98 -k keys/devkeypair.pem
edhnsNG1J5
</pre>

<p>Note that this time, I'm using the path of my keypair file. EC2 uses this to decrypt the password from the instance's console output. At boot time, Windows prints out the password, encrypted with the public key from the keypair you named when starting the instance.</p>

<p>Success at last: fully logged in to my virtual Windows server from my Mac desktop.</p>

<p><img height="731" width="887" style="border: 0px none ; float: none;" src="/images/blog/win_ec2/windows_ec2_logged_in.png" /></p> 
]]></content></entry><entry><title>Don't Break My Heart, EC2!</title><link href="https://michaelnygard.com/blog/2008/10/dont-break-my-heart-ec2/"/><id>https://michaelnygard.com/blog/2008/10/dont-break-my-heart-ec2/</id><published>2008-10-23T11:22:59-05:00</published><updated>2008-10-23T11:22:59-05:00</updated><content type="html"><![CDATA[<p>I'm a huge booster of <a href="http://aws.amazon.com">AWS</a> and <a href="http://aws.amazon.com/ec2">EC2</a>. I have two talks about cloud computing, and one that's pretty specific to AWS, on the <a href="http://www.nofluffjuststuff.com">No Fluff, Just Stuff</a> traveling symposium.</p>

<p>With today's announcement about EC2 coming out of beta, and about <a href="http://aws.amazon.com/windows">Windows</a> support, I wanted to try out a Windows server on EC2.</p>

<p>Heartbreak!</p>

<pre>ec2-describe-images -a | grep windows
IMAGE    ami-782bcf11    ec2-public-windows-images/Server2003r2-i386-anon-v1.00.manifest.xml    amazon    available    public        i386    machine        
IMAGE    ami-792bcf10    ec2-public-windows-images/Server2003r2-i386-EntAuth-v1.00.manifest.xml    amazon    available    public        i386    machine        
IMAGE    ami-7b2bcf12    ec2-public-windows-images/Server2003r2-x86_64-anon-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-7a2bcf13    ec2-public-windows-images/Server2003r2-x86_64-EntAuth-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-3934d050    ec2-public-windows-images/SqlSvrExp2003r2-i386-Anon-v1.00.manifest.xml    amazon    available    public        i386    machine        
IMAGE    ami-0f34d066    ec2-public-windows-images/SqlSvrExp2003r2-i386-EntAuth-v1.00.manifest.xml    amazon    available    public        i386    machine        
IMAGE    ami-8135d1e8    ec2-public-windows-images/SqlSvrExp2003r2-x86_64-Anon-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-9835d1f1    ec2-public-windows-images/SqlSvrExp2003r2-x86_64-EntAuth-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-6834d001    ec2-public-windows-images/SqlSvrStd2003r2-x86_64-Anon-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-6b34d002    ec2-public-windows-images/SqlSvrStd2003r2-x86_64-EntAuth-v1.00.manifest.xml    amazon    available    public        x86_64    machine        
IMAGE    ami-cd8b6ea4    khaz_windows2003srvEE/image.manifest.xml    602961847481    available    public        i386    machine        

mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10
Server.InsufficientInstanceCapacity: Insufficient capacity.
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10
Server.InsufficientInstanceCapacity: Insufficient capacity.
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10 -z us-east-1a
Server.InsufficientInstanceCapacity: Insufficient capacity.
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10 -z us-east-1b
Server.InsufficientInstanceCapacity: Insufficient capacity.
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10 -z us-east-1c
Server.InsufficientInstanceCapacity: Insufficient capacity.
</pre>

<p>Ack! Insufficient capacity?! That's not supposed to happen. Wait a second... let me try my own image</p>

<pre>
mtnygard@donk /var/tmp/nms $ ec2-describe-images
IMAGE    ami-8a0beee3    com.michaelnygard/nms-base-v1.manifest.xml    001356815600    available    private        i386    machine        
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-8a0beee3
RESERVATION    r-0c4a9465    001356815600    default
INSTANCE    i-8e79d0e7    ami-8a0beee3            pending        0        m1.small    2008-10-23T17:25:21+0000    us-east-1c        
mtnygard@donk /var/tmp/nms $ ec2-run-instances ami-792bcf10
Server.InsufficientInstanceCapacity: Insufficient capacity.
</pre>

<p>Very interesting.  Looks like there's enough capacity to run all the Linux based images, but not enough for Windows?</p>

<p>Seems like there might be some contractual limit on how many Windows licenses Amazon is allowed to rent out.  I would also infer some serious pent-up demand to eat them all up this quickly.</p>

<p>Or maybe it's just a glitch. We'll see.</p>

<p><b>Update [1:15 PM]</b> I was just able to start five instances. Could be fluctuations in demand, or it could be clearing of a glitch. It's always hard to tell what's really happening inside the cloud.</p>

<p><b>Update [2:50 PM]</b> My plaintive <a href="http://developer.amazonwebservices.com/connect/thread.jspa?messageID=104684">post in the AWS forums</a> got a very quick response. The inscrutable wizard <a href="http://developer.amazonwebservices.com/connect/profile.jspa?userID=53875">JeffW</a> posted a "we're working on it" and "it's fixed" messages just 3 minutes apart. We'll probably never know quite what was going on.</p> 
]]></content></entry><entry><title>Perfection is Not Always Required</title><link href="https://michaelnygard.com/blog/2008/10/perfection-is-not-always-required/"/><id>https://michaelnygard.com/blog/2008/10/perfection-is-not-always-required/</id><published>2008-10-14T21:08:22-05:00</published><updated>2008-10-14T21:08:22-05:00</updated><content type="html"><![CDATA[<p>In my series on dirty data, I made the argument that sometimes incomplete, inaccurate, or inconsistent data was OK. In fact, not only is it OK, but it can be an advantage.</p><p>There's a really slick Ruby library called <a href="http://github.com/peterc/whatlanguage/tree/master" target="_blank">WhatLanguage</a> that illustrates this beautifully. The author also wrote a <a href="http://www.rubyinside.com/whatlanguage-ruby-language-detection-library-1085.html" target="_blank">nice article</a> introducing the library. WhatLanguage automatically determines the language that a piece of text is written in.</p><p>For example (from the article)</p><pre><span class="ident">require</span> <span class="punct">'</span><span class="string">whatlanguage</span><span class="punct">'</span><br /><br /><span class="punct">&quot;</span><span class="string">Je suis un homme</span><span class="punct">&quot;.</span><span class="ident">language</span>      <span class="comment"># =&gt; :french</span><br /></pre><p>Very nice.</p><p>WhatLanguage works by comparing words in the input text to a <a href="http://en.wikipedia.org/wiki/Bloom_filter" target="_blank">data structure</a> that can tell you whether a word exists in the corpus. There's the catch, though. It can return a false positive! That would mean you get an incorrect &quot;yes&quot; sometimes for words that <em>aren't</em> in the language in question. On the other hand, it's guaranteed against false negatives.</p><p>You might imagine that there are pretty limited circumstances when you'd use a data structure that sometimes returns incorrect answers. (There is a calculable probability of a false positive. It never reaches zero.) It works for WhatLanguage, though.</p><p>You see, each word contributes to a histogram binned by possible language. Ultimately, one language &quot;wins&quot;, based on whichever has the most entries in the histogram. False positives may contribute an extra point to incorrect languages, but the correct language will pretty much always emerge from the noise, provided there's enough source text to work from.</p><p>So, there's another example of information emerging from noisy inputs, just as long as there's enough of it. <br /></p><p>&nbsp;</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Arrival at JAOO</title><link href="https://michaelnygard.com/blog/2008/09/arrival-at-jaoo/"/><id>https://michaelnygard.com/blog/2008/09/arrival-at-jaoo/</id><published>2008-09-27T23:34:14-05:00</published><updated>2008-09-27T23:34:14-05:00</updated><content type="html"><![CDATA[<p>Considering that it's 7:30 AM local time---where &quot;local&quot; means <a target="_blank" href="http://en.wikipedia.org/wiki/Aarhus">Aarhus, Denmark</a>---and I'm awake and online, it looks like I've successfully reset my internal clock.&nbsp; Of course, my approach consisted of staying awake for 28 hours continuously then having three excellent beers with dinner.&nbsp; There are probably easier ways, and there may be repercussions later.</p><p>I've always heard good things about <a target="_blank" href="http://jaoo.dk/conference/">JAOO</a>, so it was an honor and a delight to be invited. So far, just hanging around the hotel has been interesting. Waiting to check in yesterday evening, I encountered Richard Gabriel and one of the guys who designed <a target="_blank" href="http://www.microsoft.com/windowsserver2003/technologies/management/powershell/default.mspx">Windows PowerShell</a>. (He still calls it Monad, which I think was a much better name than &quot;PowerShell&quot;.&nbsp; Also, I wish I'd gotten his name, but I was a too distracted by the problem with my reservation.)</p><p>After dinner, I started chatting with some <a target="_blank" href="http://www.thoughtworks.com/">ThoughtWorkers</a> over a game of <a target="_blank" href="http://www.wunderland.com/LooneyLabs/Fluxx/Zombie/">ZombieFluxx</a>. Two observations: first, ZombieFluxx is the kind of game that only a computer programmer or a lawyer could love. The deck of cards includes many cards that change the rules of the game itself. Gameplay changes from turn to turn based on the current state of the rule cards showing. There's even a card that requires you to groan like the undead whenever you turn over a new &quot;zombie&quot; card. Very meta.&nbsp; Second, it seems that TW people make up half of every conference I go to. They must have a fantastic training budget, because they are disproportionately represented relative to their much larger competitors like Accenture, Deloitte, and that crowd. Woe to the conference industry if ThoughtWorks falls on hard times.</p><p>My primary goal for today was to get over jetlag. Having accomplished that before 8 AM, I'll now see about straightening out my hotel situation. It's hard to think much about software when you may not have a roof over your head come nightfall. </p><p><strong>Update</strong>: Got my hotel issues resolved. Now at a thoroughly modern, thoroughly Danish hotel called the &quot;Best Western Oasia&quot;. Funny, but I always think of &quot;Best Western&quot; as the cruddy, mildewed cheap hotels off the Interstate in places like west Texas and Birmingham, Alabama. This hotel may cause me to reevaluate that image! It's nice, in a kind of &quot;living inside Ikea&quot; way.</p><p>(And, yes, I know Ikea is Swedish, not Danish. It's the bare wood, spare furnishings, and black lacquer I'm talking about.) <br /></p> 
]]></content></entry><entry><title>The Infamous Seinfeld-Gates Ad</title><link href="https://michaelnygard.com/blog/2008/09/the-infamous-seinfeld-gates-ad/"/><id>https://michaelnygard.com/blog/2008/09/the-infamous-seinfeld-gates-ad/</id><published>2008-09-05T17:07:07-05:00</published><updated>2008-09-05T17:07:07-05:00</updated><content type="html"><![CDATA[<p>The Seinfeld/Gates ad is so laughably bad that people are already building indexes of the negative reactions, less than 24 hours after it launched.</p><p>I have my own take on it.</p><p>Gates is the most recognizable geek on the planet. For most non-techies, he is the archetype of geekhood.</p><p>What kind of name recognition does Steve Ballmer have?&nbsp; Outside of developers, developers, developers, and developers.&nbsp; Would a silver-haired manager ever use him for a cheesy business analogy in a meeting?&nbsp; Nope. Blank looks all around.&nbsp; Tiger Woods and Bill Gates make good metaphors. Steve Ballmer doesn't.</p><p>Ray Ozzie? Not a chance. Even most techies don't know who Ozzie is.</p><p>This commercial wasn't about churros, <a href="http://www.shoecarnival.com/">The Conquistador</a>, or briefs riding up. It was all about one line. <br /></p><p>&quot;Brain meld&quot;.</p><p>It slipped by fast, but that was it. That was the line where billg@microsoft.com began the public torch-passing ceremony.</p><p>A couple more spots, and we'll see either Ballmer or Ozzie entering the plot. Then we get the handoff, where John Q. Public is now meant to understand, &quot;OK, Bill Gates has retired, but he's passed his wireframe glasses and nervous tics on to <em>this</em> guy.&quot; </p><p>Seriously, it's torch-passing.&nbsp; Don't believe me? You will when you see Ballmer <a href="http://www.geekologie.com/2008/08/post_44.php">air-running past a giant BSOD</a> in the final ad.</p> 
]]></content></entry><entry><title>In Korean</title><link href="https://michaelnygard.com/blog/2008/09/in-korean/"/><id>https://michaelnygard.com/blog/2008/09/in-korean/</id><published>2008-09-04T16:55:30-05:00</published><updated>2008-09-04T16:55:30-05:00</updated><content type="html"><![CDATA[<p>&quot;Release It&quot; has now been translated into Korean. I just received three copies of a work that's hauntingly familiar, but totally opaque to me.</p><p>I kind of wonder how the pop-culture jokes came through.&nbsp; I bet C3PO and R2D2 made it OK, but I wonder whether &quot;dodge, duck, dip, dive, and dodge&quot; made it past the Korean copy editor.&nbsp; (For that matter, I'm faintly surprised it made it past the English copy editor.)<br /></p> 
]]></content></entry><entry><title>ReadWriteWeb on Dirty Data</title><link href="https://michaelnygard.com/blog/2008/08/readwriteweb-on-dirty-data/"/><id>https://michaelnygard.com/blog/2008/08/readwriteweb-on-dirty-data/</id><published>2008-08-24T22:29:50-05:00</published><updated>2008-08-24T22:29:50-05:00</updated><content type="html"><![CDATA[<p>A short while back, I did a <a href="/blog/2008/07/mounds-of-filthy-data/">brief series</a> on the value of &quot;dirty data&quot;---copious amounts of unstructured, non-relational data created by the many interactions user have with your site and each other.</p><p><a href="http://www.readwriteweb.com" target="_blank">ReadWriteWeb</a> has a post up about Four Ad-Free Ways that Mined Data Can Make Money, along very similar lines.&nbsp; Well worth a read. <br /></p> 
]]></content></entry><entry><title>97 Things Every Software Architect Should Know</title><link href="https://michaelnygard.com/blog/2008/08/97-things-every-software-architect-should-know/"/><id>https://michaelnygard.com/blog/2008/08/97-things-every-software-architect-should-know/</id><published>2008-08-19T14:12:26-05:00</published><updated>2008-08-19T14:12:26-05:00</updated><content type="html"><![CDATA[<p>O'Reilly is creating a new line of &quot;community-authored&quot; books. One of them is called &quot;97 Thing Every Software Architect Should Know&quot;.</p><p>All of the &quot;97 Things&quot; books will be created by wiki, with the best entries being selected from all the wiki contributions.</p><p>I've contributed several axioms that have been selected for the book:</p><ul><li><a href="http://97-things.near-time.net/wiki/show/talk-about-the-arch-but-see-the-scaffolding-beneath-it">Talk about the arch, but see the scaffolding beneath it</a></li><li><a href="http://97-things.near-time.net/wiki/show/you-re-negotiating-more-often-than-you-think">You're negotiating more often than you think</a></li><li><a href="http://97-things.near-time.net/wiki/show/Software%20architecture%20has%20ethical%20consequences">Software architecture has ethical consequences</a></li><li><a href="http://97-things.near-time.net/wiki/show/don-t-put-your-resume-ahead-of-the-requirements"> </a><a href="http://97-things.near-time.net/wiki/show/Everything%20will%20ultimately%20fail">Everything will ultimately fail</a> <br /></li><li><a href="http://97-things.near-time.net/wiki/show/Engineer%20in%20the%20white%20spaces">Engineer in the white spaces</a> <br /></li></ul><p>Long-time readers of this blog may recognize some of these themes.</p><p>You can see the <a href="http://97-things.near-time.net/wiki">whole wiki here</a>. <br /></p><p>&nbsp;</p> 
]]></content></entry><entry><title>How Buildings Learn</title><link href="https://michaelnygard.com/blog/2008/08/how-buildings-learn/"/><id>https://michaelnygard.com/blog/2008/08/how-buildings-learn/</id><published>2008-08-19T10:14:43-05:00</published><updated>2008-08-19T10:14:43-05:00</updated><content type="html"><![CDATA[<p>Stewart Brand's famous book <a href="https://www.amazon.com/gp/product/0140139966?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0140139966">How Buildings Learn</a> has been on my reading queue for a while, possibly a few years. Now that I've begun reading it, I wish I had gotten it sooner. Listen to this:</p>

<blockquote>The finished-looking model and visually obsessive renderings dominate the let's-do-it meeting, so that shallow guesses are frozen as deep decisions. All the design intelligence gets forced to the earliest part of the building process, when everyone knows the least about what is really needed.</blockquote>

<p>Wow. It's hard to tell what industry he's talking about there. It could easily apply to software development. No wonder Brand is so well-regarded in the Agile community!</p>

<p>Another wonderful parallel is between what Brand calls &quot;Low Road&quot; and &quot;High Road&quot; buildings. A Low Road building is one that is flexible, cheap, and easy to modify. It's hackable. Lofts, garages, old factory floors, warehouses, and so on. Each new owner can gut and modify it without qualms. A building where you can drill holes through the walls, run your own cabling, and rip out every interior wall is a Low Road building.</p>

<p>High Road buildings evolve gradually over time, through persistent care and love. There doesn't necessarily have to be a consistent--or even coherent--vision, but each own does need to feel a strong sense of preservation. High Road buildings become monuments, but they aren't <i>made</i> that way. They just evolve in that direction as each generation adds their own character.</p>

<p>Then there are the buildings that aren't High or Low Road. Too static to be Low Road, but not valued enough to be High Road. Resistant to change, bureaucratic in management. Diffuse responsibility produces static (i.e., dead) buildings. Deliberately setting out to design a work of art, paradoxically, prevents you from creating a living, livable building.</p>

<p>Again, I see some clear parallels to software architecture here. On the one hand, we've got Low Road architecture. Easy to glue together, easy to rip apart. Nobody gets bent out of shape if you blow up a hodge-podge of shoestring batch jobs and quick-and-dirty web apps. CGI scripts written in perl are classic Low Road architecture. It doesn't mean they're bad, but they're probably not going to go a long time without being changed in some massive ways.</p>

<p>High Road architecture would express a conservativism that we don't often see. High Road is <i>not</i> &quot;big&quot; architecture. Rather, High Road means cohesive systems lovingly tended. Emacs strikes me as a good example of High Road architecture. Yes, it's accumulated a lot of bits and oddments over the years, but it's quite conservative in its architecture.</p>

<p>Enterprise SOA projects, to me, seem like dead buildings. They're overspecified and too focused on the moment of rollout. They're the grand facades with leaky roofs. They're the corporate office buildings that get gerrymandered into paralysis. They preach change, but produce stasis.</p>

 
]]></content></entry><entry><title>Dan Pritchett on Availability</title><link href="https://michaelnygard.com/blog/2008/08/dan-pritchett-on-availability/"/><id>https://michaelnygard.com/blog/2008/08/dan-pritchett-on-availability/</id><published>2008-08-17T08:03:19-05:00</published><updated>2008-08-17T08:03:19-05:00</updated><content type="html"><![CDATA[<p><a href="http://www.linkedin.com/in/driveawedge">Dan Pritchett</a> is a man after my own heart. His latest post talks about the path to availability enlightenment. The obvious path--reliable components and vendor-supported commercial software--leads only to tears.</p><p>You can begin on the path to enlightenment when you set aside dreams of perfect software running on perfect hardware, talking over perfect networks. Instead, embrace the reality of fallible components. Don't design around them, design for them. </p><p>How do you design for failure-prone components? That's what most of <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">Release It!</a> is all about. <br /></p> 
]]></content></entry><entry><title>Agile Tool Vendors</title><link href="https://michaelnygard.com/blog/2008/08/agile-tool-vendors/"/><id>https://michaelnygard.com/blog/2008/08/agile-tool-vendors/</id><published>2008-08-08T08:06:35-05:00</published><updated>2008-08-08T08:06:35-05:00</updated><content type="html"><![CDATA[<p>There seems to be something inherently contradictory about &quot;Enterprise&quot; agile tool vendors. There's never been a tool invented that's as flexible in use or process as the 3x5 card. No matter what, any tool must embed some notion of a process, or at least a meta-process.</p><p>I've looked at several of the &quot;agile lifecycle management&quot; and &quot;agile project management&quot; tools this week. To me, they all look exactly like regular project management tools. They just have some different terminology and ajax-y web interfaces.</p><p>Vendors listen: just because you've got a drag-and-drop rectangle on a web page doesn't make it agile!</p><p>The point of agile tools isn't to move cards around the board in ever-cooler ways. It isn't to automatically generate burndown graphs and publish them for management.</p><p>The point of agile tools is this: at any time, the team can choose to rip up the pavement and do it differently next iteration.</p><p>What happens once you've paid a bunch of money for some enterprise lifecycle management tool from one of <a href="http://www.google.ca/search?q=agile+project+management+tool&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">these outfits</a>? (Name them and they appear; so I won't.) Investment requires use. Once you've paid for something---or once your boss has paid for it---you'll be stuck using it. </p><p>Now look, I'm not against tools. I use them as force multipliers all the time. I just don't want to get stuck with some albatross of a PLM, ALM, LFCM, or LEM, just because we paid a gob of money for it. </p><p>The only agile tools I want are those I can throw away without qualm when the team decides it doesn't fit any more. If the team cannot change its own processes and tools, then it cannot adapt to the things it learns. If it cannot adapt, it isn't agile. Period.</p> 
]]></content></entry><entry><title>Beyond the Village</title><link href="https://michaelnygard.com/blog/2008/07/beyond-the-village/"/><id>https://michaelnygard.com/blog/2008/07/beyond-the-village/</id><published>2008-07-29T06:40:25-05:00</published><updated>2008-07-29T06:40:25-05:00</updated><content type="html"><![CDATA[<p>As an organization scales up, it must navigate several transitions. If it fails to make these transitions well, it will stall out or disappear.</p><p>One of them happens when the company grows larger than &quot;village-sized&quot;. In a village of about 150 people or less, it's possible for you to know everyone else. Larger than that, and you need some kind of secondary structures, because personal relationships don't reach from every person to every other person. Not coincidentally, this is also the size where you see startups introducing mid-level management.</p><p>There are other factors that can bring this on sooner. If the company is split into several locations, people at one location will lose track of those in other locations. Likewise, if the company is split into different practice areas or functional groups, those groups will tend to become separate villages on their own. In either case, the village transition will happen sooner than 150.<br /></p><p>It's a tough transition, because it takes the company from a flat, familial structure to a hierarchical one. That implicitly moves the axis of status from pure merit to positional. Low-numbered employees may find themselves suddenly reporting to a newcomer with no historical context. It shouldn't come as a surprise when long-time employees start leaving, but somehow the founders never expect it.<br /></p><p>This is also when the founders start to lose touch with day-to-day execution. They need to recognize that they will never again know every employee by name, family, skills, and goals. Beyond village size, the founders have to be professional managers. Of course, this may also be when the board (if there is one) brings in some professional managers. It shouldn't come as a surprise when founders start getting replaced, but somehow they never expect it.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>S3 Outage Report and Perspective</title><link href="https://michaelnygard.com/blog/2008/07/s3-outage-report-and-perspective/"/><id>https://michaelnygard.com/blog/2008/07/s3-outage-report-and-perspective/</id><published>2008-07-26T20:39:20-05:00</published><updated>2008-07-26T20:39:20-05:00</updated><content type="html"><![CDATA[<p>Amazon has issued a more detailed statement explaining the S3 outage from June 20, 2008.&nbsp; In my company, we'd call this a &quot;Post Incident Report&quot; or PIR. It has all the necessary sections:</p><ul><li>Observed behavior</li><li>Root cause analysis</li><li>Followup actions: corrective and operational</li></ul><p>This is exactly what I'd expect from any mature service provider.</p><p>There are a few interesting bits from the report. First, the condition seems to have arisen from an unexpected failure mode in the platform's self-management protocol. This shouldn't surprise anyone. It's a new way of doing business, and some of the most innovative software development, applied at the largest scales. Bugs will creep in.</p><p>In fact, I'd expect to find more cases of odd emergent behavior at large scale.</p><p>Second, the back of my envelope still shows S3 at 99.94% availability for the year. That's better than most data center providers. It's certainly better than most corporate IT departments do.</p><p>Third, Amazon rightly recognizes that transparency is a necessary condition for trust. Many service providers would fall into the &quot;bunker mentality&quot; of the embattled organization. That's a deteriorating spiral of distrust, coverups, and spin control. Transparency is most vital after an incident. If you cannot maintain transparency then, it won't matter at any other time.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Article on Building Robust Messaging Applications</title><link href="https://michaelnygard.com/blog/2008/07/article-on-building-robust-messaging-applications/"/><id>https://michaelnygard.com/blog/2008/07/article-on-building-robust-messaging-applications/</id><published>2008-07-22T09:37:46-05:00</published><updated>2008-07-22T09:37:46-05:00</updated><content type="html"><![CDATA[<p>I've talked <a href="/blog/2007/09/engineering-in-the-white-space/">before</a> about adopting a failure-oriented mindset. That means you should expect every component of your system or application to someday fail. In fact, they'll usually fail at the worst possible times.</p><p>When a component does fail, whatever unit of work it's processing at the time will most likely be lost. If that unit of work is backed up by a transactional database, well, you're in luck. The database will do it's <a href="http://en.wikipedia.org/wiki/Galaxy_Quest">Omega-13</a> bit on the transaction and it'll be like nothing ever happened.</p><p>Of course, if you've got more than one database, then you either need two-phase commit or pixie dust. (OK, compensating transactions can help, too, but only if the thing that failed isn't the thing that would do the compensating transaction.)</p><p>I don't favor distributed transactions, for a <a href="/blog/2007/11/architecting-for-latency/">lot of reasons</a>. They're not scalable, and I find that the risk of deadlock goes way up when you've got multiple systems accessing multiple databases. Yes, uniform lock ordering will prevent that, but I never want to trust my own application's stability to good coding practices in other people's apps.</p><p>Besides, enterprise integration through database access is just... icky.</p><p>Messaging is the way to go. Messaging offers superior scalability, better response time to users, and better resilience against partial system failures. It also provides enough spatial, temporal, and logical decoupling between systems that you can evolve the endpoints independently.</p><p><a href="http://www.udidahan.com/">Udi Dahan</a> has published an excellent <a href="http://msdn.microsoft.com/en-us/magazine/cc663023.aspx">article</a> with several patterns for robust messaging. It's worth reading, and studying. He addresses the real-world issues you'll encounter when building messaging apps, such as giant messages clogging up your queue, or &quot;poison&quot; messages that sit in the front of the queue causing errors and subsequent rollbacks.<br /></p> 
]]></content></entry><entry><title>Kingpins of Filthy Data</title><link href="https://michaelnygard.com/blog/2008/07/kingpins-of-filthy-data/"/><id>https://michaelnygard.com/blog/2008/07/kingpins-of-filthy-data/</id><published>2008-07-17T20:50:12-05:00</published><updated>2008-07-17T20:50:12-05:00</updated><content type="html"><![CDATA[<p>If large amounts of dirty data are actually valuable, how do you go about collecting it? Who's in the best position to amass huge piles?<br /><br />One strategy is to scavenge publicly visible data. Go screen-scrape whatever you can from web sites. That's Google's approach, along with one camp of the Semantic Web tribe.<br /><br />Another approach is to give something away in exchange for that data. Position yourself as a connector or hub. Brokers always have great visibility. The IM servers, the Twitter crowd, and the social networks in general sit in the middle of great networks of people. LinkedIn <a href="http://venturebeat.com/2008/02/06/linkedins-new-research-network-mining-social-relationships-for-business/">is pursuing</a> this approach, as are <a href="http://blog.twitter.com/2008/07/finding-perfect-match.html">Twitter+Summize</a>, and BlogLines. Facebook has already made multiple, highly creepy, attempts to capitalize on their &quot;man-in-the-middle&quot; status. Meebo is in a good spot, and trying to leverage it further. Metcalfe's Law will make it hard to break into this space, but once you do, your visibility is a great natural advantage.<br /><br />Aggregators get to see what people are interested in. <a href="http://friendfeed.com/">FriendFeed</a> is sitting on a torrential flow of dirty data. (&quot;Sewage&quot;, perhaps?) <a href="http://www.feedburner.com/">FeedBurner</a> sees the value in their dirty data.<br /><br />Anyone at the endpoint of traffic should be able to get good insight into their own world. While the aggregators and hubs get global visibility, the endpoints are naturally more limited. Still, that shouldn't stop them from making the most of the dirt flowing their way. Amazon has done well here.</p><p>Sun is making a run at this kind of visibility with <a href="/blog/2008/05/project-hydrazine/">Project Hydrazine</a>, but I'm skeptical. They aren't naturally in a position to collect it, and off-to-the-side instrumentation is never as powerful. Although, companies like <a href="http://www.omniture.com/en/">Omniture</a> have made a market out of off-to-the-side instrumentation, so there's a possibility there.</p><p>Carriers like Verizon, Qwest, and AT&amp;T are in a natural position to take advantage of the traffic crossing their networks, but as parties in a regulated industry, they are mostly prohibited from looking at the traffic crossing their networks.<br />fantastic visibility</p><p>So, if you're a carrier or a transport network, you're well positioned to amass tons of dirty data. If you are a hub or broker, then you've already got it. Otherwise, consider giving away a service to bring people in. Instead of supporting it with ad revenue, support it by gleaning valuable insight.</p><p>Just remember that a little bit of dirty data is a pain in the ass, but mountains of it are pure gold. <br /></p> 
]]></content></entry><entry><title>Inverting the Clickstream</title><link href="https://michaelnygard.com/blog/2008/07/inverting-the-clickstream/"/><id>https://michaelnygard.com/blog/2008/07/inverting-the-clickstream/</id><published>2008-07-16T11:00:00-05:00</published><updated>2008-07-16T11:00:00-05:00</updated><content type="html"><![CDATA[<p>Continuing my theme of <a href="/blog/2008/07/mounds-of-filthy-data/">filthy data</a>.</p><p>A few years ago, there was a lot of excitement around <a target="_blank" href="http://en.wikipedia.org/wiki/Clickstream">clickstream analysis</a>. This was the idea that, by watching a user's clicks around a website, you could predict things about that user.</p><p>What a backwards idea.</p><p>For any given user, you can imagine an huge number of plausible explanations for any given browsing session. You'll never enumerate all the use cases that motivate someone to spend ten minutes on seven pages of your web site.</p><p>No, the user doesn't tell us much about himself by his pattern of clicks.</p><p>But the aggregate of all the users' clicks... that tells us a lot! Not about the users, but about how the users perceive our site. It tells us about ourselves!</p><p>A commerce company may consider two products to be related for any number of reasons. Deliberate cross-selling, functional alignment, interchangability, whatever. Any such relationships we create between products in the catalog only reflect how we view our own catalog. Flip that around, though, and look at products that the users view as related. Every day, in every session, users are telling us that products have some relationship to each other.</p><p>Hmm. But, then, what about those times when I buy something for myself and something for my kids during the same session? Or when I got that prank gift for my brother? </p><p>Once you aggregate all that dirty data, weak connections like the prank gift will just be part of the background noise. The connections that stand out from the noise are the real ones, the only ones that ultimately matter.</p><p>This is an inversion of the clickstream. It tells us nearly nothing about the clicker. Instead, it illuminates the clickee. <br /></p> 
]]></content></entry><entry><title>Mounds of Filthy Data</title><link href="https://michaelnygard.com/blog/2008/07/mounds-of-filthy-data/"/><id>https://michaelnygard.com/blog/2008/07/mounds-of-filthy-data/</id><published>2008-07-16T08:15:58-05:00</published><updated>2008-07-16T08:15:58-05:00</updated><content type="html"><![CDATA[<p>Data is the future.</p><p>The barriers to entering online business are pretty low, these days. You can do it with <a href="http://www.johnmwillis.com/ibm/cloud-computing-and-the-enterprise/" target="_blank">zero infrastructure</a>, which means no capital spent on depreciating assets like servers and switches. Open source operating systems, databases, servers, middleware, libraries, and development tools mean that you don't spend money on software licenses or maintenance contracts. All you need is an idea, followed by a <a href="http://jrothman.com/blog/mpd/2008/06/waterfall-projects-create-naivete.html" target="_blank">SMOP</a>.</p><p>With both the cost side trending toward zero, how can there be any barrier to entry?</p><p>The &quot;classic&quot; answer is the network effect, also known as <a target="_blank" href="http://en.wikipedia.org/wiki/Metcalfe%27s_Law">Metcalfe's Law</a>. (The word &quot;classic&quot; in web business models means anything more than two years old, of course.) The first Twitter user didn't get a whole lot out of it. The ten-million-and-first gets a lot more benefit. That makes it tough for a newcomer like <a href="http://plurk.com/redeemByURL?from_uid=39189&amp;check=104064598&amp;s=1">Plurk</a> to get an edge.<br /></p><p>I see a new model emerging, though. Metcalfe's Law is part of it, keeping people engaged. The best thing about having users, though, is that they do things. Every action by every user tells you something, if you can keep track of it all.<br /></p><p>Twitter gets a lot of its value from the people connected at the endpoints. But, they also get enormous power from being the <em>hub in the middle</em> of it. Imagine what you can do when you see the content of every message passing through a system that large. A few things come to mind right away. You could extract all the links that people are posting to see what's hot today. (Zeitgeist.) You could use semantic analysis to tell how people feel about current topics, like Presidential candidates in the U.S. You could track product names and mentions to see which products delight people and which cause frustration. You could publish a slang dictionary that actually keeps up! The possibilities are enormous.</p><p>Ah, I can already sense an objection forming. How the heck is anyone supposed to figure out all that stuff from noisy, messy textual human communication? We're cryptic, ironic, and oblique. We sometimes mean the exact opposite of what we say. Any machine intelligence that tries to grok all of Twitter will surely self-destruct, right? That supposed &quot;data&quot; is just a big steaming pile of human contradictions!</p><p>In my view, though, it's the dirtiness of the data that makes it beautiful. Yes, there will be contradictions. There will be ironic asides. But, those will come out in the wash. They'll be balanced out by the sincere, meaningful, or obvious. Not every message will be semantically clear or consistent, but given enough messy data, clear patterns will still emerge.</p><p>There's the key: enough data to see patterns. Large amounts. Huge amounts. Vast piles of filthy data.</p><p>Over the next couple of days, I'll post a series of entries exploring how to amass dirty data, who's got a natural advantage, and programming models that work with it. <br /></p> 
]]></content></entry><entry><title>Hard Problems in Architecture</title><link href="https://michaelnygard.com/blog/2008/07/hard-problems-in-architecture/"/><id>https://michaelnygard.com/blog/2008/07/hard-problems-in-architecture/</id><published>2008-07-07T12:42:35-05:00</published><updated>2008-07-07T12:42:35-05:00</updated><content type="html"><![CDATA[<p>Many of the really hard problems in web architecture today exist outside the application server.&nbsp; Here are three problems that haven't been solved. Partial solutions exist today, but nothing comprehensive.<br /></p><h2>Uncontrolled demand</h2><p>Users tend to arrive at web sites in huge gobs, all at once. As the population of the Net continues to grow, and the need for content aggregators/filters grows, the &quot;front page&quot; effect will get worse.</p><p>One flavor of this is the &quot;Attack of Self-Denial&quot;, an email, radio, or TV campaign that drives enough traffic to crash the site.&nbsp; Marketing can slow you down. Really good marketing can kill you at any time.</p><h2>Versioning, deployment and rollback</h2><p>With large scale applications, as with large enterprise integrations, versioning rapidly becomes a problem. Versioning of schemas, static assets, protocols, and interfaces. Add in a dash of SOA, and you have a real nightmare. You can count on having at least one interface broken at any given time. Or, you introduce such powerful governors that nothing ever changes.</p><p>As the number of nodes increases, you eventually find that there's always at least one deployment going on. A &quot;deployment&quot; becomes less of a point-in-time activity than it is a rolling wave. A new service version will take hours or days to be deployed to every node. In the meantime, both the old and new service version must coexist peacefully. Since both service versions will need to support multiple protocol versions (see above) you have a combinatorial problem. </p><p>And, of course, some of these deployments will have problems of their own. Today, many application deployments are &quot;one way&quot; events. The deployment process itself has irreversably destructive effects. This will have to change, so every deployment can be done both forward and back. Oh, and every deployment will also be deploying assets to multiple targets---web, application, and database---while also triggering cache flushes and, possibly, metadata changes to external partners like Akamai. <br /></p><p>Applications will need to participate in their own versioning, deployment, and management.&nbsp; <br /></p><h2>Blurring the lines</h2><p>There used to be a distinction between the application and the infrastructure it ran on. That meant you could move applications around and they would behave pretty much the same in a development environment as in the real production environment. These days, firewalls, load balancers, and caching appliances blur the lines between &quot;infrastructure&quot; and &quot;application&quot;. It's going to get worse as the lines between &quot;operational support system&quot; and &quot;production applications&quot; get blurred, too. Automated provisioning, SLA management, and performance management tools will all have interactions with the applications they manage. These will inevitably introduce unexpected interactions... in a word, bugs.</p><p><br /></p> 
]]></content></entry><entry><title>Creeping Fees</title><link href="https://michaelnygard.com/blog/2008/06/creeping-fees/"/><id>https://michaelnygard.com/blog/2008/06/creeping-fees/</id><published>2008-06-25T16:25:00-05:00</published><updated>2008-06-25T16:25:00-05:00</updated><content type="html"><![CDATA[<p>A couple of years ago, the Minneapolis-St. Paul airport introduced self-pay parking gates. Scan a credit card on the way in and on the way out, and it just debits the card. This obviously saves money on parking attendants, and it's pretty convenient for parkers.</p><p>At first, to encourage adoption, they offered a discount of $2 per day. Every time you'd approach the entry, a friendly voice from a Douglas Adams novel would ask, &quot;Would you like to save $2 per day on parking?&quot; For general parking, that meant $14 instead of $16 per day.</p><p>Some time later, this switched from being an incentive for adopting the system to a penalty for avoiding it. How? They raised the rates by $2 per day. So now, the top rate if you use self-pay is back to $16. If you don't use it, then your top rate bumped up to $18. Clearly they put somebody from the banking industry in charge of this parking system.</p><p>Now, it's changed again, from $2 per day to $2 per transaction. So it's just $2 off the top of whatever your overall parking fees are.</p><p>This gradual creep is really interesting. I wonder what the next step will be. A $2 per year discount would be one way to approach it. Maybe a &quot;frequent parker&quot; program. More likely the discount will drop to $1 per transaction, or it will just be discarded altogether.</p><p>That's OK with me, because swiping the credit card is still more convenient than exchanging cash money with a human anyway.</p><p>Besides, back when it was cash based, I always got tagged with the ATM fee anyway.&nbsp;</p> 
]]></content></entry><entry><title>Word Cloud Bandwagon</title><link href="https://michaelnygard.com/blog/2008/06/word-cloud-bandwagon/"/><id>https://michaelnygard.com/blog/2008/06/word-cloud-bandwagon/</id><published>2008-06-17T14:57:19-05:00</published><updated>2008-06-17T14:57:19-05:00</updated><content type="html"><![CDATA[<p><a target="_blank" href="http://www.wordle.net/">Wordle</a> has been meming it's way around the 'Net lately.&nbsp; Figured I'd join the crowd by doing a word cloud for <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">Release It</a>.&nbsp; This is from the preface.</p><p><img width="832" height="571" border="0" src="/images/blog/wordle/release_it_preface.png" style="float:none; border:0px"/>&nbsp;</p>
<p>Considering that this is just from fairly simple text analysis, I'm surprised at how accurately it represents the key concerns.  &quot;Software&quot; and &quot;money&quot; have roughly equal prominence. &quot;Life&quot; appears near the middle, along with &quot;excitement&quot;, &quot;revenue&quot;, &quot;production&quot; and &quot;systems&quot;. Not bad for an algortihm.</p> 
]]></content></entry><entry><title>Webber and Fowler on SOA Man-Boobs</title><link href="https://michaelnygard.com/blog/2008/06/webber-and-fowler-on-soa-man-boobs/"/><id>https://michaelnygard.com/blog/2008/06/webber-and-fowler-on-soa-man-boobs/</id><published>2008-06-07T21:39:20-05:00</published><updated>2008-06-07T21:39:20-05:00</updated><content type="html"><![CDATA[<p>InfoQ posted a video of Jim Webber and <a href="http://www.martinfowler.com/">Martin Fowler</a> doing a <a href="http://www.infoq.com/presentations/soa-without-esb">keynote speech</a> at QCon London this Spring. It's a brilliant deconstruction of the concept of the Enterprise Service Bus. I can attest that they're both funny and articulate (whether on the stage or off.)<br /></p><p>Along the way, they talk about building services incrementally, delivering value at every step along the way. They advocate decentralized control and direct alignment between services and the business units that own them.&nbsp;</p><p>I agree with every word, though I'm vaguely uncomfortable with how often they say &quot;enterprise man boobs&quot;.</p> 
]]></content></entry><entry><title>Coincidence or Back-end Problem?</title><link href="https://michaelnygard.com/blog/2008/06/coincidence-or-back-end-problem/"/><id>https://michaelnygard.com/blog/2008/06/coincidence-or-back-end-problem/</id><published>2008-06-07T09:48:24-05:00</published><updated>2008-06-07T09:48:24-05:00</updated><content type="html"><![CDATA[<p>An odd thing happened to me today. Actually, an odd thing happened yesterday, but it's having the same odd thing happen today that really makes it odd. With me so far?</p><p>Yesterday, while I was shopping at Amazon, Amazon told me that my American Express card had expired. While it is set for a May expiration, it's several years in the future. I didn't think too much of it, because when I re-entered the same information, Amazon accepted it.</p><p>Today, I got the same thing with the same card on iTunes!</p><p>Online stores don't do a whole lot with your credit cards. For the most part, they just make a call out to a credit card processor. Small stores have to go through a second-tier CCVS system that charges a few pennies per transaction. Large ones---and do they get larger than Amazon?---generally connect directly to a payment processor. The payment processor may charge a fraction of a cent per transaction, but they definitely make it up in volume.</p><p>(There are other business factors, too, like the committed transaction volume, response time SLAs, and the like.)</p><p>Asynchronously, the payment processor collects from the issuing bank. It's the issuing bank that actually bills you, and sets your interest rate and payment terms.<br /></p><p>Whereas VISA and MasterCard work with thousands of issuers, American Express doesn't. When you get an AmEx card, they are the issuing bank as well as the payment processor.</p><p>Which makes it highly suspect that the same card gave me the same error through two different sites. It makes me think that American Express has introduced a bug in their validation system, causing spurious declines for expiration.&nbsp;</p> 
]]></content></entry><entry><title>Social Factors</title><link href="https://michaelnygard.com/blog/2008/06/social-factors/"/><id>https://michaelnygard.com/blog/2008/06/social-factors/</id><published>2008-06-06T15:45:00-05:00</published><updated>2008-06-06T15:45:00-05:00</updated><content type="html"><![CDATA[<p>I mentioned <a href="http://en.wikipedia.org/wiki/Tom_DeMarco">Tom DeMarco</a> just a <a href="/blog/2008/06/six-word-methods/">couple of days ago</a>. I'm re-reading his great book, <a href="https://www.amazon.com/gp/product/093263334X?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=093263334X">Why Does Software Cost So Much?</a> for the first time in about ten years.</p><p>Personally, I credit Tom as one of the unsung progenitors of the agile movement. Long before we had &quot;<a href="http://agilealliance.org/">Agile</a>&quot; or even &quot;lightweight methods&quot;, Tom was talking about the psycho-social nature of software development.&nbsp;</p><p>For instance, here's an excerpt from essay 8, &quot;Nontechnological Issues in Software Engineering&quot;:</p><blockquote><p>Imagine your boss just plunked a specification on your desk and asked, &quot;How long will it take you and one other person to get this job done?&quot; What's the first question out of your mouth?</p><p>Would you ask, &quot;Can we use object-oriented methods?&quot; or &quot;What CASE system can we buy?&quot; or &quot;Is it okay to use rapid prototyping?&quot; Of course not. Your first question is,</p><p><em>Who is the other person?</em></p></blockquote><p>Absolutely. Right on, Tom.&nbsp;</p> 
]]></content></entry><entry><title>Plurk.</title><link href="https://michaelnygard.com/blog/2008/06/plurk./"/><id>https://michaelnygard.com/blog/2008/06/plurk./</id><published>2008-06-06T10:13:33-05:00</published><updated>2008-06-06T10:13:33-05:00</updated><content type="html"><![CDATA[<p>A friend invited me to <a href="http://www.plurk.com">Plurk</a>. So far, I've resisted Twitter for no good reason (other than a vague sense of social insecurity.) I figure I'll dip my toe into Plurk, though.</p><p>This <a href="http://plurk.com/redeemByURL?from_uid=39189&amp;check=104064598&amp;s=1">link</a> is an open invite to Plurk. It'll let anyone join. Fair warning, it's also a &quot;friend&quot; link.&nbsp;</p> 
]]></content></entry><entry><title>Six Word Methods</title><link href="https://michaelnygard.com/blog/2008/06/six-word-methods/"/><id>https://michaelnygard.com/blog/2008/06/six-word-methods/</id><published>2008-06-03T20:44:37-05:00</published><updated>2008-06-03T20:44:37-05:00</updated><content type="html"><![CDATA[
<p>In his great collection of essays <a href=
"https://www.amazon.com/gp/product/093263334X?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=093263334X">
Why Does Software Cost So Much?</a>,
Tom DeMarco makes the interesting point that the software industry
had grown from zero to $300 billion dollars (in 1993). This
indicates that the market had at least $300B worth of demand for
software, <em>even while complaining continuously</em> about the
cost and quality of the very same software. It seems to me that the
demand for software production, together with the time and cost
pressures, has only increased dramatically since then.</p>

<p>(DeMarco enlightens us that the perennial question, "Why does
software cost so much?" is not really a question at all, but rather
a goad or a <a href= "http://www.michaelnygard.com/blog/2007/12/budgetecture-and-its-ugly-cous/">negotiation</a>. Also very true.)</p>

<p>Fundamentally, the demand for software production far outstrips
our industry's ability to supply it. In fact, I believe that we can
classify most software methods and techniques by their relation and
response to the problem of surplus demand. Some try to optimize for
least-cost production, others for highest quality, still others for
shortest cycle time.</p>

<p>In the spirit of <a target="_blank" href="http://www.npr.org/2008/02/07/18768430/six-word-memoirs-life-stories-distilled">six-word memoirs</a>, here are the sometimes dubious responses that
various technology and development methods' offer to the
overwhelming demand for software production.&nbsp;</p>
<p><strong>Waterfall</strong>: Nevermind backlog, requirements were
signed off.</p>
<p><strong>RAD</strong>: Build prototypes faster than discarding
them.</p>
<p><strong>Offshore outsourcing</strong>: Army of cheap developers
producing junk.</p>
<p><span style="font-weight: bold">Onshore outsourcing</span>: Same
junk, but with expensive developers.</p>
<p><strong>Agile</strong>: Avoid featuritis; outrun pesky business
users.<br></p>
<p><strong>Domain-specific languages</strong>: Compress every
problem into one-liners.<br></p>
<p><strong>CMMi</strong>: Enough Process means nothing's ever
wasted.<br></p>
<p><strong>Relational Databases</strong>: Code? Who cares? Data
lives forever.</p>
<p><strong>Model-driven architecture</strong>: Jackson Pollack's
models into inscrutable code.<br></p>
<p><strong>Web Services</strong>: Terrorize XML until maximum reuse
achieved.</p>
<p><strong>FORTH</strong>: backward writing IF punctuation time
SAVE.<br></p>
<p><strong>SOA</strong>: Iron-fisted governance ensures total
calcification.<br></p>
<p><strong>Intentional programming</strong>: Parallelize
programming... make programmers of everyone.<br></p>
<p><strong>Google as IDE</strong>: It's been done, probably in
<a target="_blank" href=
"http://en.wikipedia.org/wiki/Befunge">Befunge</a>.<br></p>
<p><strong>Open-source</strong>: Bury the world in abandoned
code.</p>
<p><strong>Mashups</strong>: Parasitize others' apps, then APIs
change.</p>
<p><strong>LISP</strong>: With enough macros, one uberprogrammer
sufficies.</p>
<p><span style="font-weight: bold">perl</span>: Too busy coding to
maintain anyway.</p>
<p><span style="font-weight: bold">Ruby</span>: Meta-programming:
same problems, mysterious solutions.</p>
<p><span style="font-weight: bold">Ocaml</span>: No, try
meta-meta-meta-programming.</p>
<p><span style="font-weight: bold">Groovy</span>: Faster Java
coding, runs like C-64.</p>
<p><span style="font-weight: bold">Software-as-a-Service</span>:
Don't write your own, rent ours.</p>
<p><span style="font-weight: bold">Cloud Computing</span>:
Programmers would go faster without administrators.</p>
]]></content></entry><entry><title>New Article: S2AP + Eclipse + Maven walkthrough</title><link href="https://michaelnygard.com/blog/2008/05/new-article-s2ap--eclipse--maven-walkthrough/"/><id>https://michaelnygard.com/blog/2008/05/new-article-s2ap--eclipse--maven-walkthrough/</id><published>2008-05-30T18:52:17-05:00</published><updated>2008-05-30T18:52:17-05:00</updated><content type="html"><![CDATA[<p>See Getting Started With SpringSource Application Platform, Eclipse, and Maven.</p><p>Most of the information out there about programming in S2AP is in blogs or references to really old OSGi tutorials. It took me long enough to configure some basic Eclipse project support that I figured it was worth writing down. All of the frameworks and tool sets are very flexible, which means you have more choices to deal with when setting up a project. Sometimes, being concrete helps... there may be a lot of options, but when it's time to do a project, you only care about one set of choices for those options. This guide is completely specific to using Eclipse to write bundle projects for SpringSource Application Platform.</p><p>If that's your specific set of needs, great! If not, that's OK too, because the beauty of the Web is that somebody else will have a tutorial on your exact combination, too.&nbsp;</p> 
]]></content></entry><entry><title>Canadian Privacy Commissioner Highlights Cloud Privacy Concerns</title><link href="https://michaelnygard.com/blog/2008/05/canadian-privacy-commissioner-highlights-cloud-privacy-concerns/"/><id>https://michaelnygard.com/blog/2008/05/canadian-privacy-commissioner-highlights-cloud-privacy-concerns/</id><published>2008-05-28T13:42:26-05:00</published><updated>2008-05-28T13:42:26-05:00</updated><content type="html"><![CDATA[<p>A little while ago, I wrote a <a href="/blog/2008/04/geography-imposes-itself-on-th/">piece</a> about the conflict between &quot;clouds&quot; and the hard boundaries of the political sphere. There's no physical place called &quot;cyberspace&quot;, and any cloud computing infrastructure has to actually exist somewhere.</p><p>Like many U.S. citizens, I really hate the idea that facts about me become somebody else's copyrighted property just because they get stored in a database. Canada has a justifiably good reputation for protecting its citizens' privacy. Their legal framework takes the refreshing position of protecting individuals rather than protecting the ability of non-corporeal entities (a.k.a. &quot;incorporated persons&quot;, a.k.a. &quot;corporations&quot;) to collect any and all information.</p><p>I hadn't realized that there were such offices as the &quot;Information and Privacy Commissioner of Ontario&quot;, however.</p><p>Better still, Ontario's IPC Commissioner, Dr. Ann Cavoukian, is very current. She's just released a white paper on the privacy implications of cloud computing. She's calling for open standards around digital identity management, and outlines some technological building blocks needed for controllable trust and identity verification.</p><p>Unlike the U.S. approach to identity verification, Dr. Cavoukian's approach has nothing to do with catching illegal aliens, welfare frauds, or terrorists. Instead, it's about creating open, trustworthy ways for humans to interact in all their various modalities from commerce, to entertainment, and even to romance.&nbsp;</p> 
]]></content></entry><entry><title>Quickie: GAE is GA</title><link href="https://michaelnygard.com/blog/2008/05/quickie-gae-is-ga/"/><id>https://michaelnygard.com/blog/2008/05/quickie-gae-is-ga/</id><published>2008-05-28T12:56:12-05:00</published><updated>2008-05-28T12:56:12-05:00</updated><content type="html"><![CDATA[<p>According to eWeek, Google will make GAE open to public use on May 28th.&nbsp; Which would be today.</p><p>The <a href="http://code.google.com/appengine/">original GAE site</a> isn't updated at this point, but you can <a target="_blank" href="http://appengine.google.com">get started</a> anyway.&nbsp; I just set up my account and registered an app. (I predict tens of thousands of empty apps. Long-tail distribution here, just like SourceForge: an overwhelming majority of empty projects, with a vanishingly tiny minority that have 99% of the traffic.)<br /></p><p>Now I just need to find time to learn Python and write something cool.&nbsp;</p> 
]]></content></entry><entry><title>Wii Wescue</title><link href="https://michaelnygard.com/blog/2008/05/wii-wescue/"/><id>https://michaelnygard.com/blog/2008/05/wii-wescue/</id><published>2008-05-16T15:56:58-05:00</published><updated>2008-05-16T15:56:58-05:00</updated><content type="html"><![CDATA[
<p>So, I got a Wii for Father's Day last year. It's been a lot of fun to play
together with my kids, my wife, and even my parents and in-laws. It's fantastic
to have a game system that we can all play together and be reasonably
competitive.&nbsp; My six-year old can hold her own in Wii bowling, but she cries
a lot when we play Halo. (I'm just kidding...)</p>

<p>Unfortunately, my three-year old put a shiny disc of her own into it: a
plastic toy coin. Well, it does say &quot;Play Money&quot; right on the
front. Right in the drive slot. I figured my Wii was a goner for sure.</p>

<p><img width="240" height="144" title="&quot;Play Money&quot;" alt="&quot;Play
Money&quot;"
src="https://farm4.static.flickr.com/3164/2497334455_94ea9b20fd_m.jpg"
style="border: 0; align=center;"/>&nbsp;</p>

<p>I set about opening the thing up to remove the coin, but got stumped by these
custom screws, kind of like a Philips head, but with three prongs. Turns out
these are called &quot;<a
href="http://en.wikipedia.org/wiki/Triwing">Triwing</a>&quot; screws and they're
specifically designed to keep end users out of the machine, on the theory that
these are not widely used screws, so most people won't have the means to unscrew
them. True, it slowed me down a bit. I had to order a kit from Thinkgeek that has
driver bits for every console on the market.</p>

<p>Opened it up, got the coin out, and the Wii still works!</p>

<p>But, surely these belong <em>somewhere</em>, don't they?</p>

<p style="height: 700px;">
<img width="500" height="300" border="0" title="Letfovers?" alt="Letfovers?" src="https://farm4.static.flickr.com/3166/2498164200_55a8e98e44.jpg" style="border: 0; align=center;" />&nbsp;</p>
]]></content></entry><entry><title>Opening Up SpringSource AP</title><link href="https://michaelnygard.com/blog/2008/05/opening-up-springsource-ap/"/><id>https://michaelnygard.com/blog/2008/05/opening-up-springsource-ap/</id><published>2008-05-14T15:57:20-05:00</published><updated>2008-05-14T15:57:20-05:00</updated><content type="html"><![CDATA[<p>Just now getting my hands on the SpringSource Application Platform. It's deceptive, because there's very little functionality exposed when you run it. It starts up with less ceremony than Apache or Tomcat. (Which is kind of funny, when you consider that it <em>includes</em> Tomcat.)</p>

<p>When you look at the bundle repository, though, it's clear that a lot of stuff is packaged in here. In a way, that's like the Spring framework itself. On the surface, it looks like just a bean configurator. All the <span style="font-style: italic">really</span> powerful stuff is in the libraries built out of that small core.</p>

<p>Here's a quick listing of the bundles in version 1.0.0.beta:&nbsp;</p>

<pre>
./bundles/ext/com.springsource.com.google.common.collect-0.5.0.alpha.jar
./bundles/ext/com.springsource.edu.emory.mathcs.backport-3.0.0.jar
./bundles/ext/com.springsource.javax.activation-1.1.0.jar
./bundles/ext/com.springsource.javax.annotation-1.0.0.jar
./bundles/ext/com.springsource.javax.ejb-3.0.0.jar
./bundles/ext/com.springsource.javax.el-2.1.0.jar
./bundles/ext/com.springsource.javax.jms-1.1.0.jar
./bundles/ext/com.springsource.javax.mail-1.4.0.jar
./bundles/ext/com.springsource.javax.persistence-1.0.0.jar
./bundles/ext/com.springsource.javax.servlet-2.5.0.jar
./bundles/ext/com.springsource.javax.servlet.jsp-2.1.0.jar
./bundles/ext/com.springsource.javax.servlet.jsp.jstl-1.1.2.jar
./bundles/ext/com.springsource.javax.xml.bind-2.0.0.jar
./bundles/ext/com.springsource.javax.xml.rpc-1.1.0.jar
./bundles/ext/com.springsource.javax.xml.soap-1.3.0.jar
./bundles/ext/com.springsource.javax.xml.stream-1.0.1.jar
./bundles/ext/com.springsource.javax.xml.ws-2.1.1.jar
./bundles/ext/com.springsource.json-1.0.0.BUILD-20080422112602.jar
./bundles/ext/com.springsource.org.antlr-3.0.1.jar
./bundles/ext/com.springsource.org.aopalliance-1.0.0.jar
./bundles/ext/com.springsource.org.apache.catalina-6.0.16.jar
./bundles/ext/com.springsource.org.apache.commons.fileupload-1.2.0.jar
./bundles/ext/com.springsource.org.apache.commons.io-1.4.0.jar
./bundles/ext/com.springsource.org.apache.commons.logging-1.1.1.jar
./bundles/ext/com.springsource.org.apache.coyote-6.0.16.jar
./bundles/ext/com.springsource.org.apache.el-6.0.16.jar
./bundles/ext/com.springsource.org.apache.jasper-6.0.16.jar
./bundles/ext/com.springsource.org.apache.jasper.org.eclipse.jdt-6.0.16.jar
./bundles/ext/com.springsource.org.apache.juli.extras-6.0.16.jar
./bundles/ext/com.springsource.org.apache.taglibs.standard-1.1.2.jar
./bundles/ext/com.springsource.org.aspectj.runtime-1.6.0.m2.jar
./bundles/ext/com.springsource.org.aspectj.weaver-1.6.0.m2.jar
./bundles/ext/com.springsource.slf4j.org.apache.commons.logging-1.5.0.jar
./bundles/ext/com.springsource.slf4j.org.apache.log4j-1.5.0.jar
./bundles/ext/org.springframework.aop-2.5.4.A.jar
./bundles/ext/org.springframework.aspects-2.5.4.A.jar
./bundles/ext/org.springframework.beans-2.5.4.A.jar
./bundles/ext/org.springframework.context-2.5.4.A.jar
./bundles/ext/org.springframework.context.support-2.5.4.A.jar
./bundles/ext/org.springframework.core-2.5.4.A.jar
./bundles/ext/org.springframework.jdbc-2.5.4.A.jar
./bundles/ext/org.springframework.jms-2.5.4.A.jar
./bundles/ext/org.springframework.orm-2.5.4.A.jar
./bundles/ext/org.springframework.osgi.core-1.1.0.M2A.jar
./bundles/ext/org.springframework.osgi.extender-1.1.0.M2A.jar
./bundles/ext/org.springframework.osgi.io-1.1.0.M2A.jar
./bundles/ext/org.springframework.transaction-2.5.4.A.jar
./bundles/ext/org.springframework.web-2.5.4.A.jar
./bundles/ext/org.springframework.web.portlet-2.5.4.A.jar
./bundles/ext/org.springframework.web.servlet-2.5.4.A.jar
./bundles/ext/org.springframework.web.struts-2.5.4.A.jar
./bundles/subsystems/com.springsource.platform.common/com.springsource.platform.common.env-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.common/com.springsource.platform.common.math-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.concurrent/com.springsource.platform.concurrent.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.config/com.springsource.platform.config.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.control/com.springsource.platform.control.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.deployer/com.springsource.platform.deployer.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.deployer/com.springsource.platform.deployer.hot-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.ffdc/com.springsource.platform.ffdc.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.io/com.springsource.platform.io.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.kernel/com.springsource.platform.kernel.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.kernel/com.springsource.platform.kernel.dm-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.management.proxy/com.springsource.platform.management.proxy-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.profile/com.springsource.platform.profile.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.serviceability/com.springsource.platform.serviceability.ffdc-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.serviceability/com.springsource.platform.serviceability.ffdc.aspects-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.serviceability/com.springsource.platform.serviceability.tracing.aspects-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.servlet/com.springsource.platform.servlet.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.servlet/com.springsource.platform.servlet.tomcat-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.system/com.springsource.platform.system.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.web/com.springsource.platform.web.core-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.web/com.springsource.platform.web.dm-1.0.0.beta.jar
./bundles/subsystems/com.springsource.platform.web/com.springsource.platform.web.support-1.0.0.beta.jar
</pre>

<p>There's clearly a lot of functionality built in, but how do you get at it? The SAP, erm, SpringSource AP documentation screams for improvement. Maybe they think that, because all the parts are documented elsewhere, there's no need for any integrated docset. If so, they would be wrong. Despite that, I'm interested enough to keep poking away at it.</p>

<p>Oh, and one other thing: the default administrator account is <tt>admin/springsource</tt>. (It's actually defined in <tt>servlet/conf/tomcat-users.xml</tt>.) For some reason, that's buried in chapter 5 of the user guide. It would be handy to make that more prominent.</p> 
]]></content></entry><entry><title>JavaOne is a Hot Zone</title><link href="https://michaelnygard.com/blog/2008/05/javaone-is-a-hot-zone/"/><id>https://michaelnygard.com/blog/2008/05/javaone-is-a-hot-zone/</id><published>2008-05-09T13:44:53-05:00</published><updated>2008-05-09T13:44:53-05:00</updated><content type="html"><![CDATA[<p>Apparently, there's a virus attack. Not a computer virus. A real virus. Hot zone instead of a hot spot.<br /></p><p>From my inbox this morning:</p><blockquote><p>The JavaOne conference team has been notified by the San Francisco Department of Public Health about an identified outbreak of a virus in the San Francisco area. Testing is still underway to identify the specific virus in question, but they believe it to be the Norovirus, a common cause of the &quot;stomach flu&quot;, which can cause temporary flu-like symptoms for up to 48 hours. Part of the San Francisco area impacted includes the Moscone Center, the site of the JavaOne conference which is being held this week. We are working with the appropriate San Francisco Department of Public Health and Moscone representatives to mitigate the impact this will have on the conference and steps are being taken overnight to disinfect the facility. We have not received any indication that the show should end early, so will have the full schedule of events on Friday as planned. We hope to see you then. <br /><br />   Please see the attached notification from the Department of Public Health.  <br /><br /> For further information, as well as Frequently Asked Questions related to the Norovirus, please visit the San Francisco Department of Public Health website at http://sfcdcp.org/norovirus.cfm&nbsp;</p></blockquote><p>The CDC description includes the phrase &quot;acute gastroenteritis.&quot;&nbsp;</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Grab Bag of Demos</title><link href="https://michaelnygard.com/blog/2008/05/grab-bag-of-demos/"/><id>https://michaelnygard.com/blog/2008/05/grab-bag-of-demos/</id><published>2008-05-09T12:54:18-05:00</published><updated>2008-05-09T12:54:18-05:00</updated><content type="html"><![CDATA[<p>Sun opened the final day of JavaOne with a general session called &quot;Extreme Innovation&quot;. This was a showcase for novel, interesting, and out-of-this-world uses of Java based technology.</p><p><strong>VisualVM</strong>&nbsp;</p><p>VisualVM works with local or remote applications, using JMX over RMI to connect to remote apps. While you have to run VisualVM itself under JDK 1.6, it can connect to any version of JVm from 1.4.2 through 1.7. Local apps are automatically detected and offered in the UI for debugging.&nbsp; VisualVM uses the <a href="http://java.sun.com/javase/technologies/core/toolsapis/jpda/">Java Platform Debugger Architecture</a> to show thread activities, memory usage, object counts, and call timing. It can also take snapshots of the application's state for post-mortem or remote analysis.</p><p>Memory problems can be a bear to diagnose. VisualVM includes a heap analyzer that can show reference chains. From the UI, it looks like it can also detect and indicate reference loops.</p><p>One interesting feature of VisualVM is the ability to add plug-ins for application-specific behavior. Sun demonstrated a Glassfish plugin that adds custom metrics for request latency and volume, and the ability to examine each application context independently.</p><p>The application does not require any special instrumentation, so you can run VisualVM directly against a production application. According to Sun, it adds &quot;almost no overhead&quot; to the application being examined. I'd still be very cautious about that. VisualVM allows you to enable CPU and memory profiling in real-time, so that will certainly have an effect on the application. Not to mention, it also lets you trigger a heap dump, which is always going to be costly.</p><p>VisualVM is available for download now.</p><p><strong>JavaScript Support in NetBeans</strong></p><p>Sun continues to push NetBeans at every turn. In this case, it was a demo of the JavaScript plugin for NetBeans. This is really a nice plugin. It uses type inferencing to provide autocompletion and semantic warnings. For example, it would warn you if a function had inconsistent return statements. (Such as returning an object from one code path mixed with a void return from another.)</p><p>It also has a handy developer aid: it warns developers about browser compatibility.</p><p>I don't do a whole lot of JavaScript, but I couldn't help thinking about other dynamic languages. Ifthe plugin can do that kind of type inferencing---without executing the code---for one dynamic language, then it should be possible to do for other dynamic languages. That could remove a lot of objections about Groovy, Scala, JRuby, etc.</p><p><strong>Fluffy Stuff at the Edge</strong></p><p>We got a couple of demos of Java in front of the end-user. One was a cell phone running an OpenGL scene at about 15 frames per second on an <a href="http://www.nvidia.com/object/apx_2500.html">NVidia chipset</a>. All the rendering was done in Java and displayed via <a href="http://www.khronos.org/opengles/">OpenGL ES</a>, with 3D positional audio. Not bad at all.</p><p><a href="http://www.projectdarkstar.com/">Project Darkstar</a> got a few moments in the spotlight, too. They showed off a game called <a href="http://www.callofthekings.com/main.php">Call of the Kings</a>, a multiplayer RTS that looked like it came from 1999.&nbsp; Call of the Kings uses the jMonkey Engine (built on top of JOAL, JOGL, and jInput) on the client and Project Darkstar's game server on the backend. It's OK, but as game engines go, I'm not sure how it will be relevant.</p><p>There was also a JavaCard demo, running Robocode players on JavaCards.&nbsp; That's not just storing the program on the card, it was actually executing on the card. Two finalists were brought up on stage (but not given microphones of their own, I noticed) for a final battle between their tanks. Yellow won, and received a PS3. Red lost, but got a PSP for making it to the finals.</p><p>Sentilla tried to get out from the &quot;<a href="http://www.theserverside.com/news/thread.tss?thread_id=49332">creepy</a>&quot; moniker by bouncing mesh-networked, location-tracking beachballs around the audience. Each one had a Sentilla &quot;mote&quot; in it, with a 3D accelerometer inside. Receivers at the perimeter of the hall could triangulate the beachballs' locations by signal strength. For me, the most interesting thing here was James Gosling's talk about powering the motes. They draw so little power that it's possible to power them from ambient sources: vibration and heat. Interesting. Still creepy, but interesting.</p><p>The next demo was mind-blowing. The <a href="http://www.livescribe.com/">livescribe pulse</a> is a Java computer built into a pen. It's hard to describe how wild this thing is, you almost have to see it for any of this to make sense.</p><p>At one point, the presenter wrote down a list, narrating as he went. For item one, he wrote the numeral &quot;1&quot; and the word &quot;pulse&quot;, describing the pen as he went. For item two, he wrote the numeral &quot;2&quot; and draw a little doodle of a desktop. Item three was the numeral and a vague cloudy thing. All this time, the pen was recoding his audio, and associating bits of the audio stream with the page locations. So when he tapped the numeral &quot;1&quot; that he had written, the pen played back his audio. Not bad.</p><p>Then he put an &quot;application card&quot; on the table and tapped &quot;Spanish&quot; on it. He wrote down the word &quot;one&quot;... and the pen spoke the word &quot;uno&quot;.&nbsp; He wrote &quot;coffee please&quot; and it said &quot;cafe por favor&quot;. Then he had it do the same phrase in Mandarin and Arabic. Handwriting recognition, machine translation, and speech synthesis all in the pen. Wow.<br /></p><p>Next, he selected a program from the pen's menu. The special notebook has a menu crosshair on it, but you can draw your own crosshair and it works the same way: use the pen to tap the up-arrow on paper, and the menu changes on the display. He picked a piano program, and the pen started to give him directions on how to draw a piano. Once he was done drawing it, he could tap the &quot;keys&quot; on paper to play notes.<br /></p><p>The pen captures x, y, and t information as you write, so it's digitizing the trajectory rather than the image. This is great for data compression when you're sharing pages across the livescribe web site. It's probably also great for forgers, so there might be a concern there.</p><p><strong>Industrial Strength</strong></p><p>Emphasizing real-time Java for a bit, Sun showed off &quot;Blue Wonder&quot;, an industrial controller built out of an x86 computer running Solaris 10 and Java RTS 2.0.&nbsp; This is suitable for factory control applications and is, apparently, very exciting to factory control people.</p><p>From the DARPA Urban Challenge event, we saw &quot;<a href="http://video.aol.com/video-detail/tommy-jr-crash-at-darpa-urban-challenge-nqe/1253754136">Tommy Jr.</a>&quot;, an autonomous vehicle. It followed Paul Perrone into the room, narrating each move it was making. Fortunately, nobody tried to demonstrate it's crowd control or <a href="http://www.youtube.com/watch?v=6MrmUBrT2p4">law enforcement features</a>. Instead, they showed off an array of high resolution sensors and actuators. It's all controlled, under very tight real-time constraints, by a single x86 board running Solaris and Java RTS.</p><p><strong>Into New Realms</strong></p><p>Next, we saw a demo of <a href="http://jmars.asu.edu/">JMars</a>. This impressive application helps scientists make sense out of the 150 terabytes of data we've collected from various Mars probes. It combines data and imaging layers from many different probes. One example overlaid hematite concentrations on top of an infrared image layer. It also knows enough about the various satellites orbits to help plan imaging requests.</p><p>Ultimately, JMars was built to help target landing sites for both scientific interest and technical viability. We'll soon see how well they did: the Phoenix lander arrives in about two weeks, targeting a site that was selected using JMars.</p><p>JMars is both free to use and is also open source. Dr. Phil Christensen from Arizon State University invited the Java community to explore Mars for themselves, and perhaps join the project team.</p><p><strong>CERN<br /></strong>Thousands of people, physicists and otherwise, are eagerly awaiting the <a href="http://www.cern.ch/LHC/">LHC</a>'s activation. We got to see a little bit behind the scenes about how Java is being used within CERN.</p><p>On the one hand, some very un-sexy business process work is being done. LHC is a vast project, so it's got people, budget, and materials to manage. Ho hum. It's not easy to manage all those business processes, but it sure doesn't demo well.</p><p>On the other hand, showing off the grid computing infrastructure does.</p><p>Once it's operating, the <a href="http://atlas.ch/">ATLAS</a> detectors alone will produce a gigabyte an hour of image data. All of it needs to be processed. &quot;Processing&quot; here means running through some amazing pattern recognition programs to analyze events, looking for anomalies. There will be far too many collisions generated every day for a physicist to look at all of them, so automated techniques have to weed out &quot;uninteresting&quot; collisions and call attention to ones that dont' fit the profile.</p><p>CERN estimates that 100,000 CPUs will be needed to process the data. They've built a coalition of facilities into a multi-tier grid. Even today, they're running 16,000 jobs on the grid across hundreds of data centers. With that many nodes involved, they need some good management and visualization tools, and we got to see one. It's a 3D world model with iconified data centers showing their status and capacity. Jobs fly from one to another along geodesic links. Very cool stuff.</p><p><strong>Summary</strong></p><p>Java is a mature technology that's being used in many spheres other than application server programming. For me, and many other JavaOne attendees, this session really underscored the fact that none of our own projects are anywhere near as cool as these demos. I'm left with the desire to go build something cool, which was probably the point.<br /><br /></p> 
]]></content></entry><entry><title>SOA: Time For a Rethink</title><link href="https://michaelnygard.com/blog/2008/05/soa-time-for-a-rethink/"/><id>https://michaelnygard.com/blog/2008/05/soa-time-for-a-rethink/</id><published>2008-05-08T16:00:00-05:00</published><updated>2008-05-08T16:00:00-05:00</updated><content type="html"><![CDATA[<p>The notion of a service-oriented architecture is real, and it can deliver. The term &quot;SOA&quot;, however, has been entirely hijacked by a band of dangerous <a target="_blank" href="/blog/2007/11/soa-without-the-edifice/">Taj Mahal architects</a>. They seem innocuous, it part because they'll lull you to sleep with endless protocol diagrams. Behind the soporific technology discussion lies a grave threat to your business. </p><p>&quot;SOA&quot; has come to mean <a target="_blank" href="/blog/2008/05/saps-soa-esr-1/">top-down, up-front, strong-governance, all-or-nothing</a> process (moving at glacial speed) implemented by an ill-conceived stack of technologies. SOAP is not the problem. WSDL is not the problem. Even BPEL is not the problem. The problem begins with the entire world view.<br /></p><p>We need to abandon the term &quot;SOA&quot; and invent a new one. &quot;SOA&quot; is chasing a false goal. The idea that services will be so strongly defined that no integration point will ever break is unachievable. Moreover, it's optimizing for the wrong thing. Most business today are not safety-critical. Instead, they are highly competitive.<br /> </p><p>We need loosely-coupled services, not orchestration.<br /></p><p>We need services that emerge from the business units they serve, not an IT governance panel.</p><p>We need services to change as rapidly as the business itself changes, not after a chartering, funding, and governance cycle.<br /><br />Instead of trying to build an antiseptic, clockwork enterprise, we need to embrace the messy, chaotic, Darwinian nature of business. We should be enabling rapid experimentation, quick rollout of &quot;barely sufficient&quot; systems, and fast feedback. We need to enable emergence, not stifle it.<br /></p><p>Anything that slows down that cycle of experimentation and adjustment puts your business on the same evolutionary path as the Great Auk. I never thought I'd find myself quoting Tom Peters in a tech blog, but the key really is to &quot;Test fast, fail fast, adjust fast.&quot;<br /></p> 
]]></content></entry><entry><title>The JVM is Great, But</title><link href="https://michaelnygard.com/blog/2008/05/the-jvm-is-great-but/"/><id>https://michaelnygard.com/blog/2008/05/the-jvm-is-great-but/</id><published>2008-05-08T14:29:06-05:00</published><updated>2008-05-08T14:29:06-05:00</updated><content type="html"><![CDATA[<p>Much of the interest in dynamic languages like Groovy, JRuby, and <a href="http://www.scala-lang.org/">Scala</a> comes from running on the JVM. That lets them leverage the tremendous R&amp;D that&rsquo;s gone into JVM performance and stability. It also opens up the universe of Java libraries and frameworks.</p>
<p>And yet, much of my work deals with the 80% of cost that comes after the development project is done. I deal with the rest of the software&rsquo;s lifetime. The end of development is the beginning of the software&rsquo;s life. Throughout that life, many of the biggest, toughest problems exist around and between the JVM&rsquo;s: Scalability, stability, interoperability, and adaptability.</p>
<p>As I previously showed in <a href="/blog/2008/03/reality/">this graphic</a>, the easiest thing for a Java developer to create is a slow, unscalable, and unstable web application. Making high-performance, robust, scalable applications still requires serious expertise. This is a big problem, and I don&rsquo;t see it getting better. Scala might help here in terms of stability, but I&rsquo;m not yet convinced it&rsquo;s suitable for the largest masses of Java developers. Normal attrition means that the largest population of developers will always be the youngest and least experienced. This is not a training problem: in the post-commoditization world, the majority of code will always be written by undertrained, least-cost coders. That means we need platforms where the easiest thing to do is also the right thing to do.</p>
<p>Scaling distributed systems has gotten better over the last few years. Distributed memory caching has reached the mainstream. <a href="http://www.terracotta.org/">Terracotta</a> and Coherence are both mature products, and they both let you try them out for free. In the open source crowd, as usual, you lose some manageability and some time-to-implement, but the projects work when you use them right. All of these do the job of connecting individual JVMs to a caching layer. On the other hand, I can&rsquo;t help but feel that the need for these products points to a gap in the platform itself.</p>
<p><a href="http://www.osgi.org/">OSGi</a> is finally reaching the mainstream. It&rsquo;s been needed for a long time, for a couple of reasons. First, it&rsquo;s still too common to see gigantic classpaths containing multiple versions of JAR files, leading to the perennial joy of finding obscure, it-works-fine-in-QA bugs. So, keeping individual projects in their own bundles, with no classpath pollution will be a big help. Versioning application bundles is also important for application management and deployment. OSGi is what we should have had since the beginning, instead of having the classpath inflicted on us.</p>
<p>I predict that we&rsquo;ll see more production operations moving to hot deployment on OSGi containers. For enterprise services that require 100% uptime, it&rsquo;s just no longer acceptable to bring down the whole cluster in order to do deployments. Even taking an entire server down to deploy a new revision may become a thing of the past. In the <a href="http://www.erlang.org/">Erlang</a> world, it&rsquo;s common to see containers running continuously for months or years. In <a href="http://www.amazon.com/gp/product/193435600X?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=193435600X">Programming Erlang</a>, Joe Armstrong talks about sending an Erlang process a message to &ldquo;become&rdquo; a new package. It works without disrupting any current requests and it happens atomically between one service request and the next. (In fact, Joe says that one of the first things he does on a new system is deploy the container processes, at the very beginning of the project. Later, once he knows what the system is supposed to do, he deploys new packages into those containers.) Hot deployment can be safe, if the code being deployed is sufficiently decoupled from the container itself. OSGi does that.</p>
<p>OSGi also enables strong versioning of the bundles and their dependencies. This is an all-around good thing, since it will let developers and operations agree on exactly versions of which components belong in production at a given time.</p>
]]></content></entry><entry><title>SAP's SOA ESR</title><link href="https://michaelnygard.com/blog/2008/05/saps-soa-esr/"/><id>https://michaelnygard.com/blog/2008/05/saps-soa-esr/</id><published>2008-05-07T13:01:22-05:00</published><updated>2008-05-07T13:01:22-05:00</updated><content type="html"><![CDATA[<p>SAP has been talking up their suite of SOA tools. The names all run together a bit, since they each contain some permutation of &quot;enterprise&quot; and &quot;builder&quot;, but it's a <em>very</em> impressive set of tools.</p><p>Everything SAP does comes off of an enterprise service repository (ESR). This includes a UDDI registry, and it supports service discovery and lookup. Development tools allow developers to search and discover services through their &quot;ES Workspace&quot;. Interestingly, this workspace is open to partners as well as internal developers.</p><p>From the ESR, a developer can import enough of a service defition to build a composite application. Composite applications include business process definitions, new services of their own, local UI components, and remote service references.</p><p>Once a developer creates a composite application, it can be deployed to a local container or a test server. Presumably, there's a similar tool available for administrators to deploy services, composite applications, and other enterprise components onto servers.</p><p>Through it all, the complete definition of every component goes into the ESR.</p><p>In order to make the entire service lifecycle work, SAP has defined a strong meta-model and a very strong governance process.</p><p>This is the ultimate expression of the top-down, strong-governance model for enterprise SOA.</p><p>If you're into that sort of thing.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Type Inference Without Gagging</title><link href="https://michaelnygard.com/blog/2008/05/type-inference-without-gagging/"/><id>https://michaelnygard.com/blog/2008/05/type-inference-without-gagging/</id><published>2008-05-07T09:34:09-05:00</published><updated>2008-05-07T09:34:09-05:00</updated><content type="html"><![CDATA[<p>I am not a language designer, nor a researcher in type systems. My concerns are purely pragmatic. I want usable languages, ones where doing the easy thing also happens to be doing the right thing.</p>

<p>Even today, I see a lot of code that handles exceptions poorly (whether checked or unchecked!). Even after 13 years and some trillion words of books, most Java developers barely understand when to synchronize code. (And, by the way, I now believe that there's only <a href="/blog/2008/01/two-books-that-belong-in-your/">one book on concurrency</a> you actually need.)</p>

<p>I still recall the agony of converting a large C++ code base to const-correctness. That's something that you can't just do a little bit. You add one little &quot;const&quot; keyword and sooner or later, you end up writing some gibberish that looks like:</p>
<pre>
const int foo(const char * const * blah) = const;
</pre>
<p><small>I'm exaggerating a little bit, but I bet somebody more current on C++ can come up with an even worse example.</small></p>

<p>That's the path I don't want to see Java tread.</p>

<p>On the other hand, Robert Fischer pointed out that type inference doesn't have to hurt so much. His post on OCAML's type inferencing system is a breath of fresh air.</p>

<p>There's quite a bit of other interesting stuff in there, too. I particularly like this remark:</p>

<blockquote>What the Rubyists call a &quot;DSL&quot;, Ocamlists call &quot;readable code&quot;.</blockquote>

<p>I'm still working on wrapping my head around Erlang right now (it's my &quot;new language&quot; for 2008), but I might just have to give OCAML preferred position for my 2009 new language.</p> 
]]></content></entry><entry><title>When Should You Jump? JSR 308. That's When.</title><link href="https://michaelnygard.com/blog/2008/05/when-should-you-jump-jsr-308.-thats-when./"/><id>https://michaelnygard.com/blog/2008/05/when-should-you-jump-jsr-308.-thats-when./</id><published>2008-05-06T23:10:11-05:00</published><updated>2008-05-06T23:10:11-05:00</updated><content type="html"><![CDATA[<p>One of the frequently asked questions at the <a href="https://www.nofluffjuststuff.com/" target="_blank">No Fluff, Just Stuff</a> expert panels boils down to, &quot;When should I get off the Java train?&quot; There may be good money out there for the last living COBOL programmer, but most of the Java developers we see still have a lot of years left in their careers, too many to plan on riding Java off into it's sunset.</p>

<p>Most of the panelists talk about the long future ahead of Java the Platform, no matter what happens with Java the Language. Reasonable. I also think that a young developer's best bet is to stick with the Boy Scout motto: Be Prepared. Keep learning new languages
 and new programming paradigms. Work in many different domains, styles, and architectures. That way, no matter what the future brings, the prepared developer can jump from one train to the next.</p>

<p>After today, I think I need to revise my usual answer.</p>

<p>When should a Java developer jump to a new language? Right after JSR 308 becomes part of the language.</p><p>Beware: this stuff is like Cthulu rising from the vasty deep. There's an internal logic here, but if you're not mentally prepared, it could strip away your sanity like a chill wind across a foggy moor. I promise that's the last <a href="https://wordsmith.org/words/hypergolic.html">hypergolic</a> metaphor. Besides, that was <a href="/blog/2008/05/project-hydrazine/">another post</a>.<br /></p>

<p>JSR 308 aims to bring Java a more precise type system, and to make the type system arbitrarily extensible. I'll admit that I had no idea what that meant, either. Fortunately, presenter and MIT Associate Professor Michael Ernst gave us several examples to consider.</p>

<p>The expert group sees two problems that need to be addressed. </p>

<p>The first problem is a syntactic limitation with annotations today: they can only be applied to type declarations. So, for example, we can say:</p>

<pre>
@NonNull List&lt;String&gt; strings;
</pre>

<p>If the right annotation processor is loaded, this tells the compiler that <tt>strings</tt> will never be null. The compiler can then help us enforce that by warning on any assignment that could result in <tt>strings</tt> taking on a null value.</p>

<p>Today, however, we cannot say:</p>

<pre>
@NonNull List&lt;@NonNull String&gt; strings;
</pre>

<p>This would mean that the variable <tt>strings</tt> will never take a null value, and that no list element it contains will be null.</p>

<p>Consider another example:</p>

<pre>
@NonEmpty List&lt;@NonNull String&gt; strings = ...;
</pre>

<p>This is a list whose elements may not be null. The list itself will not be empty. The compiler---more specifically, an annotation processor used by the compiler---will help enforce this.</p>

<p>They would also add the ability to annotate method receivers:</p>

<pre>
void marshal(@Readonly Object jaxbElement, @Mutable Writer writer) @Readonly { ... }
</pre>

<p>This tells the type system that <tt>jaxbElement</tt> will not be changed inside the method, that writer will be changed, and that executing <tt>marshal</tt> will not change the receiving object itself.</p>

<p>Presumably, to enforce that final constraint, marshal would only be permitted to call other methods that the compiler could verify as consistent with <tt>@Readonly</tt>. In other words, applying <tt>@Readonly</tt> to one method will start to percolate through into other methods it calls.</p>

<p>The second problem the expert group addresses is more about semantics than syntax. The compiler keeps you from making obvious errors like: </p>

<pre>
int i = &quot;JSR 308&quot;;
</pre>

<p>But, it doesn't prevent you from calling <tt>getValue().toString()</tt> when <tt>getValue()</tt> could return null. More generally, there's no way to tell the compiler that a variable is not null, immutable, interned, or tainted.</p>

<p>Their solution is to add a pluggable type system to the Java compiler. You would be able to annotate types (both at declaration and at usage) with arbitrary type qualifiers. These would be statically carried through compilation and made available to pluggable processors. Ernst showed us an example of a processor that can check and enforce not-null semantics. (Available for download from the link above.) In a sample source code base (of approximately 5,000 LOC) the team added 35 not-null annotations and suppressed 4 warnings to uncover 8 latent <tt>NullPointerException</tt> bugs.</p>

<p>Significantly, Findbugs, Jlint, and PMD all missed those errors, because none of them include an inferencer that could trace all usages of the annotated types.</p>

<p>That all sounds good, right? Put the compiler to work. Let it do the tedious work tracing the extended semantics and checking them against the source code.</p>

<p>Why the Lovecraftian gibbering, then?</p>

<p>Every language has a complexity budget. Java blew through it with generics in Java 5. Now, seriously, take another look at this:</p>

<pre>
@NotEmpty List&lt;@NonNull String&gt; strings = new ArrayList&lt;@NonNull String&gt;();
</pre>

<p>Does that even look like Java? That complexity budget is just a dim smudge in our rear-view mirror here. We're so busy keeping the compiler happy here, we'll completely forget what our actual project it.</p>

<p>All this is coming at exactly the worst possible time for Java the Language. The community is really, really excited about dynamic languages now. Instead of those contortions, we could just say:</p>

<pre>
var strings = [&quot;one&quot;, &quot;two&quot;];
</pre>

<p>Now seriously, which one would you rather write? True, the dynamic version doesn't let me enlist the compiler's aid for enforcement. True, I do need many more unit tests with the dynamic code. Still, I'd prefer that &quot;low ceremony&quot; approach to the mouthful of formalism above.</p>

<p>So, getting back to that mainstream Java developer... it looks like there are only two choices: more dynamic or more static. More formal and strict, or more loosey-goosey and terse. JSR 308 will absolutely accelerate this polarization.</p>

<p>And, by the way, in case you were thinking that Java the Language might start to follow the community move toward dynamic languages, Alex Buckley, Sun's spec lead for the Java language, gave us the answer today.</p>

<p>He said, &quot;Don't look for any 'var' keywords in Java.&quot;</p>
]]></content></entry><entry><title>SOA at 3.5 Million Transactions Per Hour</title><link href="https://michaelnygard.com/blog/2008/05/soa-at-3.5-million-transactions-per-hour/"/><id>https://michaelnygard.com/blog/2008/05/soa-at-3.5-million-transactions-per-hour/</id><published>2008-05-06T15:47:01-05:00</published><updated>2008-05-06T15:47:01-05:00</updated><content type="html"><![CDATA[<p>Matthias Schorer talked about <a href="http://www.fiducia.de/" target="_blank">FIDUCIA IT AG</a> and their service-oriented architecture. This financial services provider works with 780 banks in Europe, processing 35,000,000 transactions during the banking day. That works out to a little over 3.5 million transactions per hour.</p><p>Matthias described this as a service-oriented architecture, and it is. Be warned, however, that SOA does not imply or require web services. The services here exist in the middle tier. Instead of speaking XML, they mainly use serialized Java objects. As Matthias said, &quot;if you control both ends of the communication, using XML is just crazy!&quot;</p><p>They do use SOAP when communicating out to other companies.</p><p>They've done a couple of interesting things. They favor asynchronous communication, which makes sense when you <a href="/blog/2007/11/architecting-for-latency/">architect for latency</a>. Where many systems push data into the async messages, FIDUCIA does not. Instead, they put the bulk data into storage (usually files, sometimes structured data) and send control messages instructing the middle tier to process the records. This way, large files can be split up and processed in parallel by a number of the processing nodes. Obviously, this works when records are highly independent of each other.</p><p>Second, they have defined explicit rules and regulations about when to expect transactional integrity. There are enough restrictions that these are a minority of transactions. In all other cases, developers are required to design for the fact that ACID properties do not hold.</p><p>Third, they've build a multi-layered middle tier. Incoming requests first hit a pair of &quot;Central Process Servers&quot; which inspect the request. Requests are dispatched to individual &quot;portals&quot; based on their customer ID. Different portals will run different versions of the software, so FIDUCIA supports customers with different production versions of their software. Instead of attempting to combine versions on a single cluster, they just partition the portals (clusters.)</p><p>Each portal has its own load distribution mechanism, using work queues that the worker nodes listen to.</p><p>This multilevel structure lets them scale to over 1,000 nodes while keeping each cluster small and manageable.</p><p>The net result is that they can process up to 2,500 transactions per second, with no scaling limit in sight.<br /></p> 
]]></content></entry><entry><title>Project Hydrazine</title><link href="https://michaelnygard.com/blog/2008/05/project-hydrazine/"/><id>https://michaelnygard.com/blog/2008/05/project-hydrazine/</id><published>2008-05-06T11:56:16-05:00</published><updated>2008-05-06T11:56:16-05:00</updated><content type="html"><![CDATA[<p>Part of Sun's push behind JavaFX will be called &quot;Project Hydrazine&quot;.&nbsp; (Hydrazine is a toxic and volatile <a href="http://en.wikipedia.org/wiki/Hydrazine">rocket fuel</a>.)&nbsp; This is still a bit fuzzy, and they only left the <a href="/blog/2007/09/engineering-in-the-white-space/">boxes-and-arrows</a> slide up for a few seconds, but here's what I was able to glean.</p><p>Hydrazine includes common federated services for discovery, personalization, deployment, location, and development. There's a &quot;cloud&quot; component to it, which wasn't entirely clear from their presentation. Overall, the goal appears to be an easier model for creating end-user applications based on a service component architecture. All tied together and presented with <a href="/blog/2008/05/javaone/">JavaFX</a>, of course.<br /></p><p>One very interesting extension---called &quot;Project Insight&quot;---that Rich Green and Jonathan Schwartz both discussed is the ability to instrument your applications to monitor end-user activity in your apps.</p><p>(This immediately reminded me of <a href="http://www.valvesoftware.com/">Valve</a>'s instrumentation of <a href="http://www.whatistheorangebox.com/">Half-Life 2, episode 2</a>. The game itself reports back to Valve on <a href="http://www.steampowered.com/status/ep2/ep2_stats.php">player stats</a>: time to complete levels, map locations where they died, play time and duration, and so on. Valve has previously talked about using these stats to improve their level design by finding out where players get frustrated, or quit, and redesigning those levels.)<br /></p><p>I can see this being used well: making apps more usable, proactively analyzing what features users appreciate or don't understand, and targeting development effort at improving the overall experience.</p><p>Of course, it can also be used to target advertising and monitor impressions and clicks. Rich promoted this as the way to monetize apps built using Project Hydrazine. I can see the value in it, but I'm also ambivalent about creating even more channels for advertising.</p><p>In any event, users will be justifiably anxious about their TV watching them back. It's just a little too <a href="http://www.maxheadroom.com/">Max Headroom</a> for a lot of people. Sun says that the data will only appear in the aggregate. This leads me to believe that the apps will report to a scalable, cloud-based aggregation service from which developers can get the aggregated data. Presumably, this will be run by Sun. </p><p>Unlike Apple's iron-fisted control over iPhone application delivery, Sun says they will not be exercising editorial control. According to Schwartz, Hydrazine will all be free: free in price, freely available, and free in philosophy.<br /></p> 
]]></content></entry><entry><title>JavaOne: After the Revolution</title><link href="https://michaelnygard.com/blog/2008/05/javaone-after-the-revolution/"/><id>https://michaelnygard.com/blog/2008/05/javaone-after-the-revolution/</id><published>2008-05-06T11:37:19-05:00</published><updated>2008-05-06T11:37:19-05:00</updated><content type="html"><![CDATA[<p>What happens to the revolutionaries, once they've won?</p><p>It's been about ten years since I last made the pilgramage to JavaOne, back when Java was still being called an&nbsp; &quot;emerging technology&quot;.</p><p>Many things have changed since then. Java is now so mainstream that the early adopters are getting itchy feet and looking hard for the next big thing. (The current favorite is some flavor of dynamic language running on the JVM: Groovy, Scala, JRuby, Jython, etc.) Java, the language, has found a home inside large enterprises and their attendant consultancies and commoditized outsourcers.</p><p>We just heard Sun say that, Java SE is on 91% of all PCs and laptops, 85% of mobile phones, and 100% of all Blu-Ray players. It's safe to say that the revolution is over. We won.</p><p>A couple of things haven't changed about JavaOne in the last ten years.</p><p>The crowds in Moscone are still completely absurd. There aren't lines, so much as there are tides. People ebb and flow like a <a href="http://en.wikipedia.org/wiki/Non-newtonian_fluid">non-Newtonian fluid</a>.&nbsp;</p><p>Sun still keeps a tight reign on the Message. (This control is one of the major tensions between Sun and the broader Java community.) This year, Sun's focus is clearly on JavaFX. The leading keynote talked repeatedly about &quot;all the screens of your life&quot; and said that the JavaFX runtime will be the access layer to reach your content from any device anywhere. We also heard about JavaFX's animation, 3D, audio, and video capabilities.</p><p>Glassfish got a brief mention. Version 3 is supposed to have a new kernel that slims down to 98KB is its minimal deployment. Add-on modules provide HTTP service, SIP service, and so on. Rich Green said hat Glassfish will scale up to the data center and down to set top boxes.</p><p>Perhaps it's just my perspective, since I'm mostly a server-side developer, but I had the oddest sense of deja-vu. Instead of Rich Green in 2008, I felt the strange sense that I was listening to Scott McNealy in 1998. Same message: Java from the handset to the data center. Set top boxes. Headspace for audio. (Anyone else remember Thomas Dolby at the keynote?&nbsp; This year we got Neil Young.)</p><p>So, here we are, at the 13th JavaOne, and Sun is still trying to get developers to see Java as more than a server-side platform.&nbsp;</p><p>Well, the more things change, the more they stay the same, I suppose.</p> 
]]></content></entry><entry><title>Who Ordered That?</title><link href="https://michaelnygard.com/blog/2008/05/who-ordered-that/"/><id>https://michaelnygard.com/blog/2008/05/who-ordered-that/</id><published>2008-05-05T18:29:57-05:00</published><updated>2008-05-05T18:29:57-05:00</updated><content type="html"><![CDATA[<p><a href="/blog/2008/05/sun-to-emerge-from-behind-in-t/">Yesterday</a>, I let myself get optimistic about what Jonathan Schwartz <a href="http://gigaom.com/2008/05/04/sun-amazon-web-services/">coyly hinted about</a> over the weekend.</p><p>The actual announcement came today.&nbsp; <a href="http://www.sun.com/aboutsun/pr/2008-05/sunflash.20080505.3.xml">OpenSolaris will be available</a> on <a href="http://www.amazon.com/b/ref=sc_fe_c_1_3435361_1?ie=UTF8&amp;node=391556011&amp;no=3435361&amp;me=A36L942TSJ2AJA">EC2</a>. Honestly, I'm not sure how relevant that is. Are people actually demanding Solaris before they'll support EC2?</p>

<p>There is a message here for Microsoft, though. The only sensible license cost for a cloud-based platform is $0.00 per instance.&nbsp;</p>

<p><strong>Addendum</strong></p>

<p>I said that OpenSolaris would be available on EC2. Looks like I should have used the present tense, instead.<br /></p>

<pre>
$ ec2-describe-images -a | grep -i solaris
IMAGE	ami-8946a3e0	opensolaris.thoughtworks.com/opensolaris-mingle-2_0_8540-64.manifest.xml	089603041495	available	public		x86_64	machine	aki-ab3cd9c2	ari-2838dd41
</pre>

<p>Yep, <a href="http://www.thoughtworks.com/">ThoughtWorks</a> already has an OpenSolaris image configured as a Mingle server.</p>

<p>(I've said it before, but there's just no need to pay money for development infrastructure any more.&nbsp; Conversely, there's no excuse for any development team to run without version control, automated builds, and continuous integration.)</p> 
]]></content></entry><entry><title>Sun to Emerge from Behind in the Clouds?</title><link href="https://michaelnygard.com/blog/2008/05/sun-to-emerge-from-behind-in-the-clouds/"/><id>https://michaelnygard.com/blog/2008/05/sun-to-emerge-from-behind-in-the-clouds/</id><published>2008-05-04T20:48:25-05:00</published><updated>2008-05-04T20:48:25-05:00</updated><content type="html"><![CDATA[<p>Nobody can miss the dramatic proliferation of cloud computing platforms and initiatives over the last couple of years. All through the last year, Sun has remained oddly silent on the whole thing. There is a clear, natural synergy between Linux, commodity x86 hardware, and cloud computing. Sun is conspicuously absent from all of those markets.&nbsp; Sun clearly needs to regain relevance in this space.<br /></p><p>On the one hand, <a href="http://www.projectcaroline.net/" target="_blank">Project Caroline</a> now has its own website. Anybody can create an account that allows forum reading, but don't count on getting your hands on hardware unless you've got an idea that Sun approves of.</p><p>Apart from that, Om Malik <a href="http://gigaom.com/2008/05/04/sun-amazon-web-services/" target="_blank">reports</a> that we may see a joint announcement Monday morning from Sun and Amazon.</p><p>I suspect that the announcement will look something like this:</p><ul><li>Based on AWS for accounts, billing, storage, and infrastructure</li><li>Java-based application deployment into a Sun grid container</li><li>AWS to handle load balancing, networking, etc.</li></ul><p>In other words: it will look a lot like Project Caroline and the Google App Engine, running Java applications using Sun containers on top of AWS.</p> 
]]></content></entry><entry><title>Agile IT! Experience</title><link href="https://michaelnygard.com/blog/2008/04/agile-it-experience/"/><id>https://michaelnygard.com/blog/2008/04/agile-it-experience/</id><published>2008-04-23T14:53:10-05:00</published><updated>2008-04-23T14:53:10-05:00</updated><content type="html"><![CDATA[<p>On June 26-28, 2008, I'll be speaking at the inagural <a target="blank_" href="http://www.agileitx.com/">Agile IT! Experience</a> symposium in Reston, VA. Agile ITX is about consistently delivering better software. It's for development teams and management, working and learning together.</p><p>It's a production of the No Fluff, Just Stuff symposium series.&nbsp; Like all NFJS events, attendance is capped, so be sure to register early.</p>

<p>From the announcement email:</p>

<p>The central theme of the Agile ITX conference (www.agileitx.com) is to help your development team/management consistently deliver better software. We'll focus on the entire software development life cycle, from requirements management to test automation to software process. You'll learn how to Develop in Iterations, Collborate with Customers, and Respond to Change. Software is a difficult field with high rates of failure. Our world-class speakers will help you implement best practices, deal with persistent problems, and recognize opportunities to improve your existing practices.</p>

<p><strong>Dates:</strong> June 26-28, 2008</p>
<p><strong>Location:</strong> Sheraton Reston</p>
<p><strong>Attendance:</strong> Developers/ Technical Management </p>

<p>Sessions at Agile ITX will cover topics such as:</p>

<ul>
<li>Continuous Integration (CI)</li>
<li>Test Driven Development (TDD)</li>
<li>Testing Strategies, Team Building</li>
<li>Agile Architecture</li>
<li>Dependency Management</li>
<li>Code Metrics &amp; Analysis</li>
<li>Acceleration &amp; Automation</li>
<li>Code Quality</li>
</ul>

<p>Agile ITX speakers are successful leaders, authors, mentors, and trainers who have helped thousands of developers create better software. You will have the opportunity to hear and interact with:</p>

<p>
<strong>Jared Richardson</strong> - co-author of <a type="amzn" asin="0974514047">Ship It!</a><br/>
<strong>Michael Nygard</strong> - author of <a type="amzn" asin="0978739213">Release It!</a><br />
<strong>Johanna Rothman</strong> - author of <a type="amzn" asin="0978739248">Manage It!</a><br />
<strong>Esther Derby</strong> - co-author of <a type="amzn" asin="0978739248">Behind Closed Doors: Secrets of Great Management</a><br />
<strong>Venkat Subramaniam</strong> - co-author of <a type="amzn" asin="097451408X">Practices of an Agile Developer</a><br />
<strong>David Hussman</strong> - Agility Instructor/Mentor<br />
<strong>Andrew Glover</strong> - co-author of <a type="amzn" asin="0321336380">Continuous Integration</a><br />
<strong>J.B. Rainsberger</strong> - author of <a type="amzn" asin="1932394230">JUnit Recipes</a><br />
<strong>Neal Ford</strong> - Application Architect at <a href="http://www.thoughtworks.com/">ThoughtWorks</a><br />
<strong>Kirk Knoernshild</strong> - contributor to <a href="http://www.agilejournal.com/">The Agile Journal</a><br />
<strong>Chris D'Agostino</strong> - CEO of <a href="http://www.nearinfinity.com/">Near Infinity</a><br />
<strong>David Bock</strong> - Principal Consultant with CodeSherpas <br />
<strong>Mark Johnson</strong> - Director of Consulting at CGI<br />
<strong>Ryan Shriver</strong> - Managing Consultant with <a href="http://www.dominiondigital.com/">Dominion Digital</a><br />
<strong>John Carnell</strong> -  IT Architect at <a href="http://www.thrivent.com/">Thrivent Financial</a><br />
<strong>Scott Davis</strong> - Testing Expert
</p> 
]]></content></entry><entry><title>Amazon Blows Away Objections</title><link href="https://michaelnygard.com/blog/2008/04/amazon-blows-away-objections/"/><id>https://michaelnygard.com/blog/2008/04/amazon-blows-away-objections/</id><published>2008-04-14T07:59:31-05:00</published><updated>2008-04-14T07:59:31-05:00</updated><content type="html"><![CDATA[<p>Amazon must have been burning more midnight oil than usual lately.</p><p>Within the last two weeks, they've announced three new features that basically eliminate any remaining objections to their AWS computing platform.</p><h2>Elastic IP Addresses&nbsp;</h2><p>Elastic IP addresses solve a major problem on the front end.&nbsp; When an EC2 instance boots up, the &quot;cloud&quot; assigns it a random IP address. (Technically, it assigns two: one external and one internal.&nbsp; For now, I'm only talking about the external IP.) With a random IP address, you're forced to use some kind of dynamic DNS service such as <a href="http://www.dyndns.org">DynDNS</a>. That lets you update your DNS entry to connect your long-lived domain name with the random IP address.<br /></p><p>Dynamic DNS services work pretty well, but not universally well.&nbsp; For one thing, there is a small amount of delay.&nbsp; Dynamic DNS works by setting a very short time-to-live (TTL) on the DNS entries, which instructs intermediate DNS servers to cache the entry only for a few minutes.&nbsp; When that works well, you still have a few minutes of downtime when you need to reassign your DNS name to a new IP address.&nbsp; For some parts of the Net, dynamic DNS doesn't work well, usually when some ISP doesn't respect the TTL on DNS entries, but caches them for a longer time.</p><p>Elastic IP addresses solve this problem. You request an elastic IP address through a Web Services call.&nbsp; The easiest way is with the command-line API:</p><p>$ ec2-allocate-address<br />ADDRESS&nbsp;&nbsp;&nbsp; 75.101.158.25&nbsp;&nbsp;&nbsp; <br /></p><p>Once the address is allocated, you own it until you release it. At this point, it's attached to your account, not to any running virtual machine. Still, this is good enough to go update your domain registrar with the new address. After you start up an instance, then you can attach the address to the machine. If the machine goes down, then the address is detached from that instance, but you still &quot;own&quot; it.</p><p>So, for a failover scenario, you can reassign the elastic IP address to another machine, leave your DNS settings alone, and all traffic will now come to the new machine.</p><p>Now that we've got elastic IPs, there's just one piece missing from a true HA architecture: load distribution. With just one IP address attached to one instance, you've got a single point of failure (SPOF). Right now, there are two viable options to solve that. First, you can allocate multiple elastic IPs and use round-robin DNS for load distribution. Second, you can attach a single elastic IP address to an instance that runs a software load balancer: <a href="http://www.apsis.ch/pound/">pound</a>, <a href="http://nginx.net/">nginx</a>, or Apache+<a href="http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html">mod_proxy_balancer</a>. (It wouldn't surprise me to see Amazon announce an option for load-balancing-in-the-cloud soon.) You'd run two of these, with the elastic IP attached to one at any given time. Then, you need a third instance monitoring the other two, ready to flip the IP address over to the standby instance if the active one fails. (There are already some <a href="http://code.google.com/p/scalr/">open-source</a> and <a href="http://www.rightscale.com/">commercial products</a> to make this easy, but that's the subject for another post.)</p><h2>Availability Zones&nbsp;</h2><p>The second big gap that Amazon closed recently deals with geography.</p><p>In the first rev of EC2, there was absolutely no way to control where your instances were running. In fact, there wasn't any way inside the service to even <em>tell</em> where they were running. (You had to resort to pingtracing or geomapping of the IPs). This presents a problem if you need high availability, because you really want more than one location.</p><p>Availability Zones let you specify where your EC2 instances should run. You can get a list of them through the command-line (which, let's recall, is just a wrapper around the web services):</p><p>$ ec2-describe-availability-zones <br />AVAILABILITYZONE&nbsp;&nbsp;&nbsp; us-east-1a&nbsp;&nbsp;&nbsp; available<br />AVAILABILITYZONE&nbsp;&nbsp;&nbsp; us-east-1b&nbsp;&nbsp;&nbsp; available<br />AVAILABILITYZONE&nbsp;&nbsp;&nbsp; us-east-1c&nbsp;&nbsp;&nbsp; available<br /></p><p>Amazon tells us that each availability zone is built independently of the others. That is, they might be in the same building or separate buildings, but they have their own network egress, power systems, cooling systems, and security. Beyond that, Amazon is pretty opaque about the availability zones. In fact, not every AWS user will see the same availability zones. They're mapped per account, so &quot;us-east-1a&quot; for me might map to a different hardware environment than it does for you.<br /></p><p>How do they come into play? Pretty simply, as it turns out. When you start an instance, you can specify which availability zone you want to run it in.</p><p>Combine these two features, and you get a bunch of <a href="http://blog.rightscale.com/2008/03/26/setting-up-a-fault-tolerant-site-using-amazons-availability-zones/">interesting deployment and management options</a>.</p><h2>Persistent Storage</h2><p>Storage has been one of the most perplexing issues with EC2. Simply put, anything you stored to disk while your instance was running would be lost when you restart the instance. Instances always go back to the bundled disk image stored on S3.</p><p>Amazon has just announced that they will be supporting persistent storage in the near future. A few lucky users get to try it out now, in it's pre-beta incarnation.</p><p>With persistent storage, you can allocate space in chunks from 1 GB to 1 TB.&nbsp; That's right, you can make one web service call to allocate a freaking terabyte! Like IP addresses, storage is owned by your account, not by an individual instance. Once you've started up an instance---say a MySQL server, for example---you attach the storage volume to it. To the virtual machine, the storage looks just like a device, so you can use it raw or format it with whatever filesystem you want.</p><p>Best of all, because this is basically a virtual SAN, you can do all kinds of SAN tricks, like snapshot copies for backups to S3.</p><p>Persistent storage done this way obviates some of the other dodgy efforts that have been going on, like&nbsp; FUSE-over-S3, or the S3 storage engine for MySQL.</p><p>SimpleDB is still there, and it's still much more scalable than plain old MySQL data storage, but we've got scores of libraries for programming with relational databases, and very few that work with key-value stores. For most companies, and for the forseeable future, programming to a relational model will be the easiest thing to do. This announcement really lowers the barrier to entry even further.</p><p>&nbsp;</p><p>With these announcements, Amazon has cemented AWS as a viable computing platform for real businesses. <br /></p> 
]]></content></entry><entry><title>Geography Imposes Itself On the Clouds</title><link href="https://michaelnygard.com/blog/2008/04/geography-imposes-itself-on-the-clouds/"/><id>https://michaelnygard.com/blog/2008/04/geography-imposes-itself-on-the-clouds/</id><published>2008-04-09T07:54:04-05:00</published><updated>2008-04-09T07:54:04-05:00</updated><content type="html"><![CDATA[<p>In a comment to my last post, <a href="http://www.third-bit.com/">gvwilson</a> asks, &quot;Are you aware that the PATRIOT Act means it's illegal for companies based in Ontario, BC, most European jurisdictions, and many other countries to use S3 and similar services?&quot;<br /><br /></p><p>This is another interesting case of the non-local networked world intersecting with real geography. Not surprisingly, it quickly becomes complex.&nbsp;</p><p>I have heard some of the discussion about S3 and the interaction between the U.S. PATRIOT act and the EU and Canadian privacy laws. I'm not a lawyer, but I'll relate the discussion for other readers who haven't been tracking it.</p><p>Canada and the European Union have privacy laws that lean toward their citizens, and are quite protective of them. In the U.S., where laws are written about privacy at all, they are heavily biased in favor of large data-collecting corporations, such as credit rating agencies.&nbsp; A key provision of the privacy laws in Canada and the EU is that companies cannot transmit private data to any jurisdiction that lacks substantially similar protections. It's kind of like the &quot;incorporation&quot; clause in the GPL that way.</p><p>In the U.S., particularly with respect to the USA PATRIOT act, companies are required to turn over private customer data to a variety of government agencies. In some cases, they are required to do this even without a search warrant or court order. These are pretty much just fishing expeditions; casting a broad net to see if you catch anything. Therefore, the EU/Canadian privacy laws judge that the U.S. does <em>not</em> have substantially similar privacy protections, and companies in those covered nations are barred from exporting, transmitting, or storing customer data in any U.S. location where they might be subject to PATRIOT act search.<br /><br />(Strictly speaking, this is not just a PATRIOT act problem. It also relates to RICO and a wide variety of other U.S. laws, mostly aimed at tracking down drug dealers by their banking transactions.)<br /><br />Enter <a href="http://aws.amazon.com/s3">S3</a>. S3 built to be a geographically-replicated distributed storage mechanism! There is no way even to figure out where the individual bits of your data are physically located. Nor is there any way to tell Amazon what legal jurisdictions your data can, or must, reside in. This is a big problem for personal customer data. It's also a problem that Amazon is aware they must solve. For <a href="http://aws.amazon.com/ec2">EC2</a>, they recently introduced Availability Zones that let you define what geographic location your virtual servers will exist in. I would expect to see something similar for S3.<br /><br />This would also appear to be a problem for EU and Canadian companies using Google's <a href="http://code.google.com/appengine" target="_blank">AppEngine</a>. It does not offer any way to confine data to specific geographies, either.<br /></p><p>Does this mean it's illegal for Canadian companies to use S3? Not in general. Web pages, software downloads, media files... these would all be allowed.&nbsp; Just stay away from the personal data.</p> 
]]></content></entry><entry><title>Suggestions for a 90-minute app</title><link href="https://michaelnygard.com/blog/2008/04/suggestions-for-a-90-minute-app/"/><id>https://michaelnygard.com/blog/2008/04/suggestions-for-a-90-minute-app/</id><published>2008-04-08T21:10:44-05:00</published><updated>2008-04-08T21:10:44-05:00</updated><content type="html"><![CDATA[<p>Some of you know my obsession with Lean, Agile, and ToC.&nbsp; Ideas are everywhere.&nbsp; Idea is nothing. Execution is everything.<br /> </p><p>In that vein, one of my <a href="http://www.nofluffjuststuff.com" target="_blank">No Fluff, Just Stuff</a> talks is called &quot;The 90 Minute Startup&quot;.&nbsp; In it, I build a real, live dotcom site during the session. You can't get a much shorter time-to-market than 90 minutes, and I really like that.</p><p>In case you're curious, I do it through the use of Amazon's <a target="_blank" href="http://aws.amazon.com/ec2">EC2</a> and <a target="_blank" href="http://aws.amazon.com/s3">S3</a> services.&nbsp;</p><p>The app I've used for the past couple of sessions is a quick and dirty GWT app that implements a <a href="http://en.wikipedia.org/wiki/Net_promoter_score" target="_blank">Net Promoter Score</a> survey about the show itself. It has a little bit of AJAX-y stuff to it, since GWT makes that really, really simple. On the other hand, it's not all that exciting as an application. It certainly doesn't make anyone sit up and go &quot;Wow!&quot;</p><p>So, anyone want to offer up a suggestion for a &quot;Wow!&quot; app they'd like to see built and deployed in 90 minutes or less?&nbsp; Since this is for a talk, it should be about the size of one user story. I doubt I'll be taking live requests from the audience during the show, but I'm happy to take suggestions here in the comments.</p><p>(Please note: thanks to the pervasive evil of blog comment spam, I moderate all comments here. If you want to make a suggestion, but don't want it published, just make a note of that in the comment.)&nbsp;</p><p><br /></p> 
]]></content></entry><entry><title>Google's AppEngine Appears, Disappoints</title><link href="https://michaelnygard.com/blog/2008/04/googles-appengine-appears-disappoints/"/><id>https://michaelnygard.com/blog/2008/04/googles-appengine-appears-disappoints/</id><published>2008-04-08T09:37:02-05:00</published><updated>2008-04-08T09:37:02-05:00</updated><content type="html"><![CDATA[<p>Google finally got into the cloud infrastructure game, announcing their <a target="_blank" href="http://code.google.com/appengine/">Google AppEngine</a>. As rumored, AppEngine opens parts of Google's legendary scalable infrastructure for hosted applications.</p><p>AppEngine is in beta, with only 10,000 accounts available. They're already long gone, but you can <a target="_blank" href="http://code.google.com/appengine/downloads.html">download the SDK</a> and run a local container.</p><p>Here are some quick pros and cons:</p><p><strong>Pro</strong></p><ul><li>Dynamically scalable</li><li>Good lifecycle management</li><li>Quota-based management for cost containment</li></ul><p><strong>Con</strong></p><ul><li>Python apps only</li><li>You deploy code, not virtual machines</li><li>Web apps only</li></ul><p>At this point, I'm a bit underwhelmed. Essentially, they're providing a virtual scalable app runtime, but not a generalized computing platform. (Similar to Sun's <a href="/blog/2008/02/sun-joining-the-cloud-crowd/">Project Caroline</a>.) Access to the really cool Google features, like GFS, is through Python APIs that Google provides.</p><p>If you fit Google's profile of a Python-based Web application developer, this could be a very fast path to market with dynamic scalability.&nbsp; Still, I think I'm going to stick with <a target="_blank" href="http://aws.amazon.com">Amazon Web Services</a>, instead.&nbsp;</p> 
]]></content></entry><entry><title>Reality</title><link href="https://michaelnygard.com/blog/2008/03/reality/"/><id>https://michaelnygard.com/blog/2008/03/reality/</id><published>2008-03-26T22:51:58-05:00</published><updated>2008-03-26T22:51:58-05:00</updated><content type="html"><![CDATA[<p align="center"><a href="/images/blog/ecom/current_reality_tree.png"><img border="0" style="float: none;" width="676" height="279" src="/images/blog/ecom/current_reality_tree.png"/></a></p> 
]]></content></entry><entry><title>OmniFocus Coming to the iPhone</title><link href="https://michaelnygard.com/blog/2008/03/omnifocus-coming-to-the-iphone/"/><id>https://michaelnygard.com/blog/2008/03/omnifocus-coming-to-the-iphone/</id><published>2008-03-18T23:37:25-05:00</published><updated>2008-03-18T23:37:25-05:00</updated><content type="html"><![CDATA[<p><img style="float: none;" width="590" height="430" border="0" src="/images/blog/OmniFocus/OmniFocusiPhoneApp.png" /></p>

<p>Over the last six months, I've grown thoroughly dependent on OmniFocus. It's a "Getting Things Done" application that lets me juggle more projects, personal and professional, than I ever thought I could.</p>

<p>Now, Omni says they're going to bring <a target="_blank" href="https://blog.omnigroup.com/2008/03/06/omnifocus-coming-for-the-iphone/">OmniFocus to the iPhone</a>. So far, the iPhone hasn't compelled me, but I think that will be the trigger.</p>
]]></content></entry><entry><title>Release It has won a Jolt Productivity award</title><link href="https://michaelnygard.com/blog/2008/03/release-it-has-won-a-jolt-productivity-award/"/><id>https://michaelnygard.com/blog/2008/03/release-it-has-won-a-jolt-productivity-award/</id><published>2008-03-06T21:55:10-06:00</published><updated>2008-03-06T21:55:10-06:00</updated><content type="html"><![CDATA[<p>It's an honor and a thrill for me to report that Release It received a <a href="http://www.sdexpo.com/2008/west/press/jolt.htm" target="_blank">Jolt Productivity</a> award!</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Steve Jobs Made Me Miss My Flight</title><link href="https://michaelnygard.com/blog/2008/03/steve-jobs-made-me-miss-my-flight/"/><id>https://michaelnygard.com/blog/2008/03/steve-jobs-made-me-miss-my-flight/</id><published>2008-03-06T13:25:16-06:00</published><updated>2008-03-06T13:25:16-06:00</updated><content type="html"><![CDATA[<p><strong><em>Or</em>: On my way to San Jose.</strong></p><p>On waking, I reach for my blackberry. It tells me what city I'm in; the hotel rooms offer no clues. Every Courtyard by Marriott is interchangeable.&nbsp; Many doors into the same house. From the size of my suitcase, I can recall the length of my stay: one or two days, the small bag.&nbsp; Three or four, the large. Two bags means more than a week.</p><p>CNBC, shower, coffee, email. Quick breakfast, $10.95 (except in California, where it's $12.95. Another clue.)</p><p>Getting there is the worst part. Flying is an endless accumulation of indignities. Airlines learned their human factors from hospitals. I've adapted my routine to minimize hassles. </p><p>Park in the same level of the same ramp. Check in at the less-used kiosks in the transit level. Check my bag so I don't have to fuck around with the overhead bins. I'd rather dawdle at the carousel than drag the thing around the terminal anyway.</p><p>Always the frequent flyer line at the security checkpoint. Sometimes there's an airline person at the entrance of that line to check my boarding pass, sometimes not. An irritation. I'd rather it was always, or never. Sometimes means I don't know if I need my boarding pass out or not.</p><p>Same words to the TSA agent.&nbsp; Standard responses. &quot;Doing fine,&quot; whether I am or not.&nbsp; Same belt.&nbsp; It's gone through the metal detector every time. I don't need to take it off.</p><p>Only... today, something is different. Instead of my bags trundling through the x-ray machine, she stops the belt.&nbsp; Calls over another agent, a palaver. Another agent flocks to the screen. A gabble, a conference, some consternation.</p><p>They pull my laptop, my new laptop making its first trip with me, out of the flow of bags. One takes me aside to a partitioned cubicle. Another of the endless supply of TSA agents takes the rest of my bags to a different cubicle. No yellow brick road here, just a pair of yellow painted feet on the floor, and my flight is boarding. I am made to understand that I should stand and wait.&nbsp; My laptop is on the table in front of me, just beyond reach, like I am waiting to collect my personal effects after being paroled.</p><p>I'm standing, watching my laptop on the table, listening to security clucking just behind me. &quot;There's no drive,&quot; one says. &quot;And no ports on the back. It has a couple of lines where the drive should be,&quot; she continues.</p><p>A younger agent, joins the crew. I must now be occupying ten, perhaps twenty, percent of the security force. At this checkpoint anyway. There are three score more at the other five checkpoints. The new arrival looks at the printouts from x-ray, looks at my laptop sitting small and alone. He tells the others that it is a real laptop, not a &quot;device&quot;. That it has a solid-state drive instead of a hard disc. They don't know what he means. He tries again, &quot;Instead of a spinning disc, it keeps everything in flash memory.&quot; Still no good. &quot;Like the memory card in a digital camera.&quot; He points to the x-ray, &quot;Here. That's what it uses instead of a hard drive.&quot;</p><p>The senior agent hasn't been trained for technological change. New products on the market? They haven't been TSA approved. Probably shouldn't be permitted. He requires me to open the &quot;device&quot; and run a program. I do, and despite his inclination, the lead agent decides to release me and my troublesome laptop.&nbsp; My flight is long gone now, so I head for the service center to get rebooked.</p><p>Behind me, I hear the younger agent, perhaps not realizing that even the TSA must obey TSA rules, repeating himself.</p><p>&quot;It's a MacBook Air.&quot; <br /></p> 
]]></content></entry><entry><title>The Granularity Problem</title><link href="https://michaelnygard.com/blog/2008/02/the-granularity-problem/"/><id>https://michaelnygard.com/blog/2008/02/the-granularity-problem/</id><published>2008-02-20T19:24:16-06:00</published><updated>2008-02-20T19:24:16-06:00</updated><content type="html"><![CDATA[<p>I spend most of my time dealing with large sites. They're always hungry for more horsepower, especially if they can serve more visitors with the same power draw. Power goes up much faster with more chassis than with more CPU core. Not to mention, administrative overhead tends to scale with the number of hosts, not the number of cores. For them, multicore is a dream come true.<br /></p><p>I ran into an interesting situation the other day, on the other end of the spectrum.</p><p>One of my team was working with a client that had relatively modest traffic levels. They're in a mature industry with a solid, but not rabid, customer base. Their web traffic needs could easily be served by one Apache server running one CPU and a couple of gigs of RAM.</p><p>The smallest configuration we could offer, and still maintain SLAs, was two hosts, with a total of 8 CPU cores running at 2 GHz, 32 gigs of RAM, and 4 fast Ethernet ports.</p><p>Of course that's oversized! Of course it's going to cost more than it should! But at this point in time, if we're talking about dedicated boxes, <em>that's the smallest configuration we can offer</em>! (Barring some creative engineering, like using fully depreciated &quot;classics&quot; hardware that's off its original lease, but still has a year or two before EOL.)</p><p>As CPUs get more cores, the minimum configuration is going to become more and more powerful. The quantum of computing is getting large.</p><p>Not every application will need it, and that's another reason I think <a target="_blank" href="/blog/2008/02/a-cloud-for-everyone-1/">private clouds</a> make a lot of sense. Companies can buy big boxes, then allocate them to specific applications in fractions. Gains cost efficiency in adminstration, power, and space consumption (though not heat production!) while still letting business units optimize their capacity downward to meet their actual demand.&nbsp;</p> 
]]></content></entry><entry><title>Sun Joining the Cloud Crowd</title><link href="https://michaelnygard.com/blog/2008/02/sun-joining-the-cloud-crowd/"/><id>https://michaelnygard.com/blog/2008/02/sun-joining-the-cloud-crowd/</id><published>2008-02-20T19:11:19-06:00</published><updated>2008-02-20T19:11:19-06:00</updated><content type="html"><![CDATA[<p>As I was writing my <a href="/blog/2008/02/a-cloud-for-everyone-1/">last post</a>, I somehow missed the news that <a href="http://www.theregister.co.uk/2008/02/15/caroline_sun_amazon/" target="_blank">Sun is building their own cloud platform</a>, called Project Caroline.</p><p>There's a <a target="_blank" href="http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-1991.pdf">PDF about it</a>. It appears to be a presentation for JavaOne.&nbsp; It may be locked down at any minute, so the link might not work by the time you read this.</p><p>Caroline looks a lot like Amazon EC2, but with some very nice control over VLANs (I suppose they would be Virtual VLANs?), load balancing policies, and DNS... all things that EC2 lacks today. ZFS instead of S3, that will make for a more familiar storage model. No trickery needed to make data persist across restarts.</p><p>All in all, it looks very nice.</p><p>(Hmmm.&nbsp; On second glance, this presentation is from JavaOne 2007!&nbsp; Not much of a scoop there, Reg.)</p><p>Does anyone know what happened to this project?&nbsp;</p><p>&nbsp;&nbsp;</p><p> &nbsp;</p> 
]]></content></entry><entry><title>A Cloud For Everyone</title><link href="https://michaelnygard.com/blog/2008/02/a-cloud-for-everyone/"/><id>https://michaelnygard.com/blog/2008/02/a-cloud-for-everyone/</id><published>2008-02-20T18:17:57-06:00</published><updated>2008-02-20T18:17:57-06:00</updated><content type="html"><![CDATA[<p>The trajectory of many high-tech products looks like this:</p><ol><li>Very expensive. Only a few exist in the world. They are heavily time-shared, and usually oversubscribed.</li><li>Within the reach of institutions and corporations, but not individuals. The organization wants to maximize utilization.<br /></li><li>Corporations own many, as productivity enhancers, some wealthy or forward-looking individuals own one. Families time share theirs.<br /></li><li>Virtually everyone has one. To lack one is to fall behind. No longer a competitive advantage, the lack of the technology puts one at a disadvantage.</li><li>Invisibility. Most people have or use several, but are not aware of it.</li></ol><p>Depending on your age, you might have been thinking &quot;cell phones&quot;, &quot;computers&quot;, or even &quot;televisions&quot;.&nbsp; I don't think I have any blog readers old enough to have been thinking &quot;telephones&quot;, &quot;telegraphs&quot;, or &quot;electric motors&quot;, but they all went through the same stages, too.</p><p>I feel very comfortable putting &quot;cloud computing&quot; in that list, too. Cloud computing is at stage 1. It's expensive enough that there are a few in the world: <a target="_blank" href="http://aws.amazon.com">Amazon AWS</a>, Mosso, <a target="_blank" href="http://www.bungeeconnect.com/">BungeeConnect</a>, even <a target="_blank" href="http://www.force.com/">Force.com</a>. They're shared, multitenant, and soon to be oversubscribed.</p><p>One day, I suspect that we'll each have our own computing cloud attending us, formed out of the many computing devices that surround us every day, but I'm getting ahead of myself.<br /></p><p>Before that, we'll see enterprises, first large then medium and small, building their own computing clouds.</p><p>&quot;Wait a minute,&quot; you object. &quot;That misses the whole point of cloud computing. The entire purpose is to <em>not </em>own the infrastructure.&quot;</p><p>That's true, today. It was also true, at one time, that farmers did not want to own their own steam engines. So, they outsourced the job. Farmers would own machines like threshers that had everything except the troublesome boiler and engine. Those required technical expertise to run, so the farmers left that job up to folks who would bring their steam engine around, hook it up to the thresher, and charge the farmer for the length of time he needed it. As steam engines got cheaper and safer, they eventually got built <a target="_blank" href="http://www.rollag.com/">right into the thresher</a>.<br /></p><p>This next part may sound like FUD. It isn't. I like cloud computing. I like virtualization. In fact, I think it's about to revolutionize our industry.</p><p>I like it so much that I think every company should have one.</p><p>Why should a company build its own cloud, instead of going to one of the providers? Several reasons, some positive, some not so much.</p><p>On the positive side, an IT manager running a cloud can finally do real chargebacks to the business units that drive demand. Some do today, but on a larger-grained level... whole servers. With a private cloud, the IT manager could charge by the compute-hour, or by the megabit of bandwidth. He could charge for storage by the gigabyte, and with tiered rates for different avaialbility/continuity guarantees. Even better, he could allow the business units to do the kind of self-service that I can do today with a credit card and <a href="http://www.theplanet.com/" target="_blank">The Planet</a>. (OK, The Planet isn't a cloud provider, but I bet they're thinking about it.&nbsp; Plus, I like them.)</p><p>I actually think this kind of self-service and fine-grained chargeback could help curb the out-of-control growth in IT spending, but that's a different post.</p><p>This would seriously raise the level of discourse. Instead of fighting about server classes, rack space, power consumption, and rampant storage sprawl, IT could talk to the business about levels of service. Does this app need 24x7 performance management with automatic resource allocation to maintain a 2 second response time? Great, we can do that! This other one doesn't need to be fast, but it had better work every single time a transaction goes through? We can do that, too! This application needs user experience monitoring, that database only needs non-redundant storage, because it can be recreated from other sources... it's a better conversation to have than, &quot;No, our corporate standard is WebSphere running on RedHat Enterprise Linux 4, with Dell PowerEdge servers.&nbsp; You can have any server you want, as long as it's a Dell PowerEdge.&quot;</p><p>I also think that the gloss will come off of the cloud computing providers. (I know, most people still haven't heard of them yet, but the gloss will inevitably come off.)</p><p>Accidents happen. Networks still break, today, and they will in the future too. Power failures happen. How would you defend yourself in a shareholders' lawsuit after millions in losses thanks to a service provider failure? (Actually, that suggests there may be an insurance market developing here. Any time you've got quantifiable risk and someone willing to pay to defray that risk, sure as hell, you'll find insurance companies.)<br /></p><p>Service providers get oversubscribed. What happens when your application is slow, and remains slow for months? Having an SLA only means you get some money back, it doesn't mean your problem will get fixed. It's a dirty secret that some service providers are quite happy paying out credits, if they can avoid bigger costs. What's your recourse? Transition costs. It costs a lot.</p><p>Latency matters. It might matter more today than ever before, since most internal applications have gone to web interfaces. Keeping your endpoints on your own network at least lets you control your own latency.&nbsp;</p><p>Then there's security. Many of my clients are dealing with <a href="https://www.pcisecuritystandards.org/" target="_blank">PCI</a> audits and compliance. I have no idea what they'd say if I suggested moving their data into the cloud. I'm pretty certain I wouldn't still be in the room to hear what they said. I'd probably be standing outside in the rain, trying to catch a cab back to the airport.<br /></p><p>Like I said, I'm not trying to FUD cloud computing. I think that it's so good that every company should have one.</p><p>There's one more reason I think it makes sense to build internal clouds. I'll talk about that in my next post.&nbsp;</p> 
]]></content></entry><entry><title>Outrunning Your Headlights</title><link href="https://michaelnygard.com/blog/2008/02/outrunning-your-headlights/"/><id>https://michaelnygard.com/blog/2008/02/outrunning-your-headlights/</id><published>2008-02-19T15:13:29-06:00</published><updated>2008-02-19T15:13:29-06:00</updated><content type="html"><![CDATA[<p>Agile developers measure their velocity. Most teams define velocity as the number of story points delivered per iteration. Since the size of a &quot;story point&quot; and the length of an iteration vary from team to team, there's not much use in comparing velocity from one team to the next. Instead, the team tracks its own velocity from iteration to iteration.</p>

<p>Tracking velocity has two purposes. The first is estimation. If you know how many story points are left for this release, and you know how many points you complete per iteration, then you know how long it will be until you can release. (This is the &quot;burndown chart&quot;.) After two or three iterations, this will be a much better projection of release date than I've ever seen any non-agile process deliver.</p>

<p>The second purpose of velocity tracking is to figure out ways to go faster.</p>

<p>In the iteration retrospective, a team will recalibrate estimating technique, to see if they can actually estimate the story cards or backlog items. Second, they'll look at ways to accomplish more during an iteration. Maybe that's refactoring part of the code, or automating some manual process. It might be as simple as adding templates to the IDE for commonly recurring code patterns.&nbsp; (That should always raise a warning flag, since recurring code patterns are a code smell. Some languages just won't let you completely eliminate it, though.&nbsp; And by &quot;some languages&quot; here, I mainly mean Java.)</p>

<p>Going faster should always be better, right? That means the development team is delivering more value for the same fixed cost, so it should always be a benefit, shouldn't it?</p>

<p>I have an example of a case where going faster didn't matter. To see why, we need to look past the boundaries of the development team.&nbsp; Developers often treat software requirements as if they come from a sort of ATM; there's an unlimited reserve of requirement and we just need to decide how many of them to accept into development.</p>

<p>Taking a cue from "Lean Software Development", though, we can look at the end-to-end value stream. The value stream is drawn from the customer's perspective. Step by step, the value stream map shows us how raw materials (requirements) are turned into finished goods. &quot;Finished goods&quot; does not mean code. Code is inventory, not finished goods. A finished good is something a customer would buy. Customers don't buy code. On the web, customers are users, interacting with a fully deployed site running in production. For shrink-wrapped software, customers buy a CD, DVD, or installer from a store. Until the inventory is fully transformed into one of these finished goods, the value stream isn't done.</p>

<p>Figure 1 shows a value stream map for a typical waterfall development process. This process has an annual funding cycle, so &quot;inventory&quot; from &quot;suppliers&quot; (i.e., requirements from the business unit) wait, on average, six months to get funded. Once funded and analyzed, they enter the development process. For clarity here, I've shown the development process as a single box, with 100% efficiency. That is, all the time spent in development is spent adding value---as the customer perceives it---to the product. Obviously, that's not true, but we'll treat it as a momentarily convenient fiction. Here, I'm showing a value stream map for a web site, so the final steps are staging and deploying the release.</p>

<span><a rel="nofollow" href="/images/blog/headlights/value_stream_waterfall.png" target="_blank"><img style="float:none;" width="709" height="272" border="1" title="Value Stream Map of Waterfall Process (click to enlarge)" alt="Value Stream Map of Waterfall Process" src="/images/blog/headlights/value_stream_waterfall.png" /></a><p style="text-align: center"><span style="font-style: italic">Figure 1 - Value Stream Map of a Waterfall Process</span></p></span>

<p>This is not a very efficient process. It takes 315 business days to go from concept to cash. Out of that time, at most 30% of it is spent adding value. In reality, if we unpack the analysis and development processes, we'll see that efficiency drop to around 5%.</p>

<p>From the "Theory of Constraints", we know that the throughput of any process is limited by exactly one constraint. An easy way to find the constraint is by looking at queue sizes. In an unoptimized process, you almost always find the largest queue right before the constraint. In the factory environment that ToC came from, it's easy to see the stacks of WIP (work in progress) inventory. In a development process, WIP shows up in SCR systems, requirements spreadsheets, prioritization documents, and so on.</p>

<p>Indeed, if we overlay the queues on that waterfall process, as in Figure 2, it's clear that Development and Testing is the constraint. After Development and Testing completes, Staging and Deployment take almost no time and have no queued inventory.</p>

<span><a rel="nofollow" href="/images/blog/headlights/value_stream_waterfall_with_queues.png" target="_blank"><img style="float:none;" width="709" height="272" border="1" title="Waterfall Value Stream, With Queues (click to enlarge)" alt="Waterfall Value Stream, With Queues" src="/images/blog/headlights/value_stream_waterfall_with_queues.png" /></a><p style="text-align: center"><span style="font-style: italic">Figure 2 - Waterfall Value Stream, With Queues</span></p></span>

<p>In this environment, it's easy to see why development teams get flogged constantly to go faster, produce more, catch up.&nbsp; They're the constraint.</p>

<p>Lean Software Development has <a href="https://www.poppendieck.com/lean.htm" target="_blank">ten simple rules</a> to optimize the entire value stream.</p>

<p>ToC says to elevate the constraint and subordinate the entire process to the throughput of the constraint. Elevating the constraint---by either going faster with existing capacity, or expanding capacity---adds to throughput, while running the whole process at the throughput of the constraint helps reduce waste and WIP.</p>

<p>In a certain sense, Agile methods can be derived from Lean and ToC.</p>

<p>All of that, though, presupposes a couple of things:
<nl>
<li>Development is the constraint.</li>
<li>There's an unlimited supply of requirements.</li>
</nl>
</p>

<p>Figure 3 shows the value stream map for a project I worked on in 2005. This project was to replace an existing system, so at first, we had a large backlog of stories to work on. As we approached feature parity, though, we began to run out of stories. The users had been waiting for this system for so long, that they hadn't given much thought, or at least recent thought, to what they might want after the initial release. Shortly after the second release (a minor bug fix), it became clear that we were actually consuming stories faster than they would be produced.</p>

<span><a rel="nofollow" href="/images/blog/headlights/value_stream_agile.png" target="_blank"><img style="float:none;" width="709" height="272" border="1" src="/images/blog/headlights/value_stream_agile.png" alt="Value Stream of an Agile Process" title="Value Stream of an Agile Process (click to enlarge)" /></a><p style="text-align: center"><span style="font-style: italic">Figure 3 - Value Stream Map of an Agile Project</span></p></span>

<p>On the output side, we ran into the reverse problem. This desktop software would be distributed to hundreds of locations, with over a thousand users who needed to be expert on the software in short order. The internal training group, responsible for creating manuals and computer based training videos, could not keep revising their training modules as quickly as we were able to change the application. We could create new user interface controls, metaphors, and even whole screens much faster than they could create training materials.</p>

<p>Once past the training group, a release had to be mastered and replicated onto installation discs. These discs were distributed to the store locations, where associates would call the operations group for a &quot;talkthrough&quot; of the installation process. Operations has a finite capacity, and can only handle so many installations every day. That set a natural throttle on the rate of releases. At one stage---after I rolled off the project---I know that a release which had passed acceptance testing in October was still in the training group by the following March.</p>

<p>In short, the development team wasn't the constraint. There was no point in running faster. We would exhaust the inventory of requirements and build up a huge queue of WIP in front of training and deployment. The proper response would be to slow down, to avoid the buildup of unfinished inventory.&nbsp; Creating slack in the workday would be one way to slow down, but drawing down the team size would be another perfectly valid response. Another perfectly valid response would be to increase the capacity of the training team. There are other places to optimize the value stream, too. But the one thing that absolutely wouldn't help would be increasing the development team's velocity.</p>

<p>For nearly the entire history of software development, there has been talk of the &quot;<a href="https://en.wikipedia.org/wiki/Software_crisis">software crisis</a>&quot;, the ever-widening gap between government and industry's need for software and the rate at which software can be produced. For the first time in that history, agile methods allow us to move the constraint off of the development team.</p>
]]></content></entry><entry><title>Software Failure Takes Down Blackberry Services</title><link href="https://michaelnygard.com/blog/2008/02/software-failure-takes-down-blackberry-services/"/><id>https://michaelnygard.com/blog/2008/02/software-failure-takes-down-blackberry-services/</id><published>2008-02-13T08:41:16-06:00</published><updated>2008-02-13T08:41:16-06:00</updated><content type="html"><![CDATA[<p>Anyone who's addicted to a Blackberry already knows about Monday's four-hour outage. For some of us, the Blackberry isn't just an electronic leash, it's part of our business operations.</p><p>Like cell phones, Blackberries have a huge, hidden infrastructure behind them. Corporate Blackberry Event Servers (BES) relay email, calendar, and contact information through RIM's infrastructure, out through the wireless carriers. It was RIM's own infrastructure that suffered from intermittent failures during the outage.</p><p><a href="http://www.datacenterknowledge.com" target="_blank">Data Center Knowledge</a> reports that the outage was caused by a <a href="http://www.datacenterknowledge.com/archives/2008/Feb/13/software_upgrade_cited_in_blackberry_outage.html" target="_blank">failed software upgrade</a>.&nbsp;</p><p>Releases are risky. We use testing and QA to reduce the risk, but every line of new or modified code represents an unknown.</p><p>How can we reduce the risk of an upgrade? One way is to roll it out slowly. Companies with widely distributed point-of-sale (POS) systems know this. They never push a release out to every store at once. They start with one or two. If that works, they go up to a larger handful, maybe four to eight. After a couple of days, they'll roll it out to an entire district. It can take a week or more to roll the release out everywhere.</p><p>In the interim, there are plenty of checkpoints where the release can be rolled back.</p><p>I strongly recommend approaching Web site releases the same way. Roll the new release out to one or two servers in your farm. Let a fraction of your customers into the new release. Watch for performance regressions, capacity problems, and functional errors. Absolutely ensure that you can roll it back if you need to. Once it's &quot;baked&quot; for a while in production, then roll it to the remaining app servers.</p><p>This approach demands a few corollaries. First, your database updates have to be structured in a forward-compatible way, and they must always allow for rollback. There can be no irrevocable updates. Second, two versions of your software will be operating simultaneously. That means your integration protocols and static assets have to be able to accommodate both versions. I discuss specific strategies for each of these aspects in <a href="https://pragprog.com/titles/mnee2/">Release It</a>.</p><p>Finally, an aside: RIM's statement about the outage isn't reflected anywhere <a target="_blank" href="http://www.researchinmotion.com/">on their site</a>. Once again, if what you want is the latest true information about a company, the very last place to find it is the company's own web site.&nbsp;</p> 
]]></content></entry><entry><title>Tim Ross' C# Circuit Breaker</title><link href="https://michaelnygard.com/blog/2008/02/tim-ross-c-circuit-breaker/"/><id>https://michaelnygard.com/blog/2008/02/tim-ross-c-circuit-breaker/</id><published>2008-02-10T17:18:00-06:00</published><updated>2008-02-10T17:18:00-06:00</updated><content type="html"><![CDATA[<p>Tim Ross has published <a href="http://timross.wordpress.com/2008/02/10/implementing-the-circuit-breaker-pattern-in-c/" target="_blank">his implementation</a> of the Circuit Breaker pattern from <a href="https://pragprog.com/titles/mnee2/" target="_blank">Release It</a>, complete with unit tests.</p><p>I barely speak C#, so I'm not in any position to review his implementation, but I'm delighted to see it!</p><p>&nbsp;</p> 
]]></content></entry><entry><title>The Pragmatic Architect on Security</title><link href="https://michaelnygard.com/blog/2008/02/the-pragmatic-architect-on-security/"/><id>https://michaelnygard.com/blog/2008/02/the-pragmatic-architect-on-security/</id><published>2008-02-06T11:45:07-06:00</published><updated>2008-02-06T11:45:07-06:00</updated><content type="html"><![CDATA[<p>Catching up on some reading, I finally got a chance to read Ted Neward's article <a href="http://msdn2.microsoft.com/en-us/library/bb245797.aspx" target="_blank">&quot;Pragmatic Architecture: Security&quot;</a>.&nbsp; It's very good.&nbsp; (Actually, the whole series is pretty good, and I recommend them all.&nbsp; At least as of February 2008... I make no assertions about future quality!)</p><p>Ted nails it.&nbsp; I agree with all of the principles he identifies, and I particularly like his advice to &quot;fail securely&quot;.&nbsp; </p><p>I would add one more, though: Be visible.</p><p>After any breach, the three worst questions are always:</p><ol><li>How long has this been happening?</li><li>How much have we lost?</li><li>Why didn't we know about it sooner?</li></ol><p>The answers are always, respectively, &quot;Far too long&quot;, &quot;We have no idea&quot;, and &quot;We didn't expect that exploit&quot;. To which the only possible response is, &quot;Well, duh, if you'd expected it, you would have closed the vulnerability.&quot;</p><p>Successful exploits are always successful because they stay hidden. Are you sure that nobody's in your systems right now, leaching data, stealing credit card numbers, or stealing products? Of course not. For a vivid case in point, google &quot;Kerviel Societe Generale&quot;.</p><p>While you cannot prove a negative, you can improve your odds of detecting nefarious activity by making sure that everything interesting is logged. (And by &quot;interesting&quot;, I mean &quot;potentially valuable&quot;.)&nbsp;</p><p>There are some pretty spiffy event correlation tools out there these days. They can monitor logs across hundreds of servers and network devices, extracting patterns of anomalous behavior. But, they only work if your application exposes data that could indicate a breach.</p><p>For example, you might not be able to log every login attempt, but you probably should log every admin login attempt.</p><p>Or, you might consider logging every price change. (I shudder to think about collusion between a merchant with pricing control and an outside buyer.&nbsp; Imagine a 10-minute long sale on laptops: 90% off for 10 minutes only.)</p><p>If your internal web service listens on a port, then it should only accept connections from known sources. Whether you enforce that through IPTables, a hardware firewall, or inside the application itself, make sure you're logging refused connections.</p><p>Then, of course, once you're logging the data, make sure someone's monitoring it and keeping pattern and signature definitions up to date!<br /></p> 
]]></content></entry><entry><title>Two Books That Belong In Your Library</title><link href="https://michaelnygard.com/blog/2008/01/two-books-that-belong-in-your-library/"/><id>https://michaelnygard.com/blog/2008/01/two-books-that-belong-in-your-library/</id><published>2008-01-19T16:50:07-06:00</published><updated>2008-01-19T16:50:07-06:00</updated><content type="html"><![CDATA[<p>I seldom plug books---other than my own, that is. I've just read two important books, however, that really deserve your attention.</p>

<h2>Concurrency, Everybody's Doing It</h2>

<p>The first is "Java Concurrency in Practice by Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, and Doug Lea. I've been doing Java development for close to thirteen years now, and I learned an enormous amount from this fantastic book. For example, I knew what the textbook definition of a volatile variable was, but I never knew why I would actually want to use one. Now I know when to use them and when they won't solve the problem.</p>

<p>Of course, JCP talks about the Java 5 concurrency library at great length. But this is no paraphrasing of the javadoc. (It was Doug Lea's original concurrency utility library that eventually got incorporated into Java, and we're all better off for it.) The authors start with illustrations of real issues in concurrent programming. Before they introduce the concurrency utilities, they explain a problem and illustrate potential solutions. (Usually involving at least one naive &quot;solution&quot; that has serious flaws.) Once they show us some avenues to explore, they introduce some neatly-packaged, well-tested utility class that either solves the problem or makes a solution possible. This removes the utility classes from the realm of &quot;inscrutable magic&quot; and presents them as &quot;something difficult that you don't have to write.&quot;</p>

<p>The best part about JCP, though, is the combination of thoroughness and clarity with which it presents a very difficult subject. For example, I always understood about the need to avoid concurrent modification of mutable state. But, thanks to this book, I also see why you have to synchronize getters, not just setters. (Even though assignment to an integer is guaranteed to happen atomically, that isn't enough to guarantee that the change is visible to other threads. The only way to guarantee ordering is by crossing a synchronization barrier <em>on the same lock</em>.)</p>

<p>Blocked Threads are one of my stability antipatterns. I've seen hundreds of web site crashes. Every single one of them eventually boils down to blocked threads somewhere. Java Concurrency in Practice has the theory, practice, and tools that you can apply to avoid deadlocks, live locks, corrupted state, and a host of other problems that lurk in the most innocuous-looking code.</p>

<h2>Capacity Planning is Science, Not Art</h2>

<p>The second book that I want to recommend today is "Capacity Planning for Web Services". I've had this book for a while. When I first started reading it, I put it down right away thinking, &quot;This is way too basic to solve any real problems.&quot; That was a big error.</p>

<p>Capacity Planning may get off to a slow start, but that's only because the authors are both thorough and deliberate. Later in the book, that deliberate pace is very helpful, because it lets us follow the math.</p>

<p>This is the only book on capacity planning I've seen that actually deals with transmission time for HTTP requests and repsonses. In fact, some of the examples even compute the number of packets that a request or reply will need.</p>

<p>I have objected to some capacity planning books because they assume that every process can be represented by an average. Not this one. In the section on standalone web servers, for example, the authors break files into several classes, then use a weighted distribution of file sizes to compute the expected response time and bandwidth requirements. This is a very real-world approach, since web requests tend toward a bimodal distribution: small HTML, Javascript, and CSS intermixed with large media files and images. (In fact, I plan on using the models in this book to quantify the effect of segregating media files from dynamic pages.)</p>

<p>This is also the only book I've seen that recognizes that capacity limits can propagate both downward and upward through tiers. There's a great example of how doubling the CPU performance in an app tier ends up increasing the demand on the database server, which almost totally nullifies the effect of the CPU upgrade. It also recognizes that all requests are not created equal, and recommends clustering request types by their CPU and I/O demands, instead of averaging them all together.</p>

<p>Nearly every result or abstract law has an example, written in concrete terms, which helps bridge theory and practice.</p>

<p>Both of these books deal with material that easily leads off into clouds of theory and abstraction. (JCP actually quips, &quot;What's a memory model, and why would I want one?&quot;) These excellent works avoid the Ivory Tower trap and present highly pragmatic, immediately useful wisdom.</p>
]]></content></entry><entry><title>Well Begun Is Half Done</title><link href="https://michaelnygard.com/blog/2008/01/well-begun-is-half-done/"/><id>https://michaelnygard.com/blog/2008/01/well-begun-is-half-done/</id><published>2008-01-15T23:49:56-06:00</published><updated>2008-01-15T23:49:56-06:00</updated><content type="html"><![CDATA[<p>How long is your checklist for setting up a new development environment? It might seem like a trivial thing, but setup costs are part of the overall friction in your project. I've seen three page checklists that required multiple downloads, logging in as several users (root and non-root), and hand-typing SQL strings to set up the local database server.</p>

<p>I think the paragon of environment setup is the ubiquitous GNU autoconf system. Anyone familiar with Linux, BSD, or other flavors of UNIX will surely recognize this three-line incantation:</p>
<pre>
./configure
make
make install
</pre>

<p>The beauty of autoconf is that it adapts to you. In the open-source world, you can't stipulate one particular set of packages or versions, at least, not if you actually want people to use your software and contribute to your project. In the corporate world, though, it's pretty common to see a project that requires a specific point-point rev of some Jakarta Commons library, but without actually documenting the version.</p>

<p>Then there are different places to put things: inside the project, in source control, or in the system. I recently went back to a project's code base after being away for more than two years. I thought we had done a good job of addressing the environment setup. We included all the deliverable jars in the codebase, so they were all version controlled. But, we decided to keep the development-only jars (like EasyMock, DBUnit, and JUnit) outside the code base. We did use Eclipse variables to abstract out the exact filesystem location, but when I returned to that code base, finding and restoring exactly the right versions of those build-time jars wasn't easy. In retrospect, we should have put the build-time jars under version control and kept them inside the code base.</p>

<p>Yes, I know that version control systems aren't good at versioning binaries like jar files. Who cares? We don't rev the jar files so often that the lack of deltas matters. Putting a new binary in source control when you upgrade from Spring 2.5 to Spring 2.5.1 really won't kill your repository. The cost of the extra disk space is nothing compared to the benefit of keeping your code base self-contained.</p>

<p>Maven users will be familiar with another approach. On a Maven project, you express external dependencies in a project model file. On the first build, Maven will download those dependencies from their &quot;official&quot; archives, then cache them locally. After that, Maven will just use the locally cached jar file, at least until you move your declared dependency to a newer revision. I have nothing against Maven. I know some people who swear by it, and others who swear at it. Personally, I just never got into it.</p>

<p>Then there are JRE extensions. This project uses JAI, which wants to be installed inside the JRE itself. We went along with that, but I was stumped for a while today when I saw hundreds of compile errors even though my Eclipse project's build path didn't show any unresolved dependencies. Of course, when you install JAI inside the JRE, it just becomes part of the Java runtime. That makes it an implicit dependency. I eventually remembered that trick, but it took a while. In retrospect, I wish we had tried harder to bring JAI's jars and native libraries into the code base as an explicit dependency.</p>

<p>Does developer environment setup time matter? I believe it does. It might be tempting to say, &quot;That's a one-time cost, there's no point in optimizing it.&quot; It's not really a one-time cost, though. It's one time per developer, every time that developer has to reinstall. My rough observation says that, between migrating to a new workstation, Windows reinstalls, corporate re-imaging, and developer churn, you should expect three to five developer setups per year on an internal project. </p>

<p>For an open-source project, the sky is the limit. Keep in mind that you'll lose potential contributors at every barrier they encounter. Environment setup is the first one.</p>

<p>So, what's my checklist for a good environment setup checklist?</p>

<ul>
<li>Keep the project self contained. Bring all dependencies into the code base. Same goes for RPMs or third-party installers.</li>

<li>Make sure all JAR files have version numbers in their file names. If the upstream project doesn't build their JAR files with version numbers, go ahead and rename the jars.</li><li>Make bootstrap scripts for database actions such as user creation or schema builds.</li>

<li>If you absolutely must embed a dependency on something that lives outside the code base, make your build script detect its location. Don't rely on specific path names.</li>

<li>Don't assume your code base is in any particular filesystem on the build machine.</li>

</ul>

<p>I'd love to see your with your own rules for easy development setup.</p> 
]]></content></entry><entry><title>"Release It" is a Jolt Award Finalist</title><link href="https://michaelnygard.com/blog/2008/01/release-it-is-a-jolt-award-finalist/"/><id>https://michaelnygard.com/blog/2008/01/release-it-is-a-jolt-award-finalist/</id><published>2008-01-13T15:31:06-06:00</published><updated>2008-01-13T15:31:06-06:00</updated><content type="html"><![CDATA[<p>The <a href="http://www.joltawards.com/" target="_blank">Jolt Awards</a> have been described as &quot;the Oscar's of our industry&quot;. (Really. It's on the front page of the site.)&nbsp; The list of <a href="http://www.joltawards.com/history/" target="_blank">past book winners</a> reads like an essential library for the software practitioner. Even the finalists and runners-up are essential reading.</p><p><a href="https://pragprog.com/titles/mnee2/" target="_blank">Release It</a> has now joined the company of <a href="http://www.joltawards.com/finalists.html" target="_blank">finalists</a>. The competition is very tough... I've read &quot;<a href="https://www.amazon.com/gp/product/0596510047?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0596510047">Beautiful Code</a>&quot; and &quot;<a href="https://www.amazon.com/gp/product/0978739248?ie=UTF8&tag=michaelnygard-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0978739248">Manage It!</a>&quot;, and both are excellent. I'll be on pins and needles until the awards ceremony on March 5th.&nbsp; Honestly, though, I'm just thrilled to be in such good company.<br /></p> 
]]></content></entry><entry><title>Should Email Errors Keep Customers From Buying?</title><link href="https://michaelnygard.com/blog/2008/01/should-email-errors-keep-customers-from-buying/"/><id>https://michaelnygard.com/blog/2008/01/should-email-errors-keep-customers-from-buying/</id><published>2008-01-06T22:31:54-06:00</published><updated>2008-01-06T22:31:54-06:00</updated><content type="html"><![CDATA[<p>Somewhere inside every commerce site, there's a bit of code sending emails out to customers.&nbsp; Email campaigning might have been in the requirements and that email code stands tall at the brightly-lit service counter.&nbsp; On the other hand, it might have been added as an afterthought, languishing in some dark corner with the &quot;lost and found&quot; department.&nbsp; Either way, there's a good chance it's putting your site at risk.</p><p>The simplest way to code an email sending routine looks something like this:</p><ol><li>Get a javax.mail.Session instance</li><li>Get a javax.mail.Transport instance from the Session<br /></li><li>Construct a javax.mail.internet.MimeMessage instance<br /></li><li>Set some fields on the message: from, subject, body.&nbsp; (Setting the body may involve reading a template from a file and interpolating values.)<br /></li><li>Set the recipients' Addresses on the message</li><li>Ask the Transport to send the message</li><li>Close the Transport</li><li>Discard the Session<br /></li></ol><p>This goes into a servlet, a controller, or a stateless session bean, depending on which MVC framework or JEE architecture blueprint you're using.</p><p>There are two big problems here. (Actually, there are three, but I'm not going to deal with the &quot;one connection per message&quot; issue.)<br /></p><h2>Request-Handling Threads at Risk<br /></h2><p>As written, all the work of sending the email happens on the request-handling thread that's also responsible for generating the response page. Even on a sunny day, that means you're spending some precious request-response cycles on work that doesn't help build the page.</p><p>You should always look at a call out to an external server with suspicion. Many of them can execute asynchronously to page generation. Anything that you can offload to a background thread, you should offload so the request-handler can get back in the pool sooner. The user's experience will be better, and your site's <a href="/blog/2007/11/two-ways-to-boost-your-flaggin/">capacity will be better</a>, if you do.</p><p>Also, keep in mind that SMTP servers aren't always 100% reliable. Neither are the DNS servers that point you to them. That goes double if you're connecting to some external service. (And please, <em>please</em> don't <em>even</em> tell me you're looking up the recipient's MX record and contacting the receiving MTA directly!)</p><p>If the MTA is slow to accept your connection, or to process the email, then the request-handling thread could be blocked for a long time: seconds or even minutes. Will the user wait around for the response? Not likely. He'll probably just hit &quot;reload&quot; and double-post the form that triggered the email in the first place.</p><h2>Poor Error Recovery</h2><p>The second problem is the complete lack of error recovery.&nbsp; Yes, you can log an exception when your connection to the MTA fails. But that only lets the administrator know that some amount of mail failed. It doesn't say <em>what the mail was</em>! There's no way to contact the users who didn't get their messages. Depending on what the messages said, that could be a very big deal.<br /></p><p>At a minimum, you'd like to be able to detect and recovery from interruptions at the MTA---scheduled maintenance, Windows patching, unscheduled index rebulids, and the like. Even if &quot;recovery&quot; means someone takes the users' info from the log file and types in a new message on their desktops, that's better than nothing.</p><h2>A Better Way</h2><p>The good news is that there's a handy way to address both of these problems at once. Better still, it works whether you're dealing with internal SMTP based servers or external XML-over-HTTP bulk mailers.</p><p>Whenever a controller decides it's time to reach out and touch a user through email, it should drop a message on a JMS queue. This lets the request-handling thread continue with page generation immediately, while leaving the email for asynchronous processing.</p><p>You can either go down the road of message-driven beans (MDB) or you can just set up a pool of background threads to consume messages from the queue. On receipt of a message, the subscriber just executes the same email generation and transmission as before, with one exception. If the message fails due to a system error, such as a broken socket connection, the message can just go right back onto the message queue for later retry. (You'll probably want to update the &quot;next retry time&quot; to avoid livelock.)</p><h2>Better Still</h2><p>If you have a cluster of application servers that can all generate outbound email, why not take the next step? Move the MDBs out into their own app server and have the message queues from all the app servers terminate there? (If you're using pub-sub instead of point-to-point, this will be pretty much transparent.) This application will resemble a message broker... for good reason. It's essentially just pulling messages in from one protocol, transforming them, then sending them out over another protocol. </p><p>The best part? You don't even have to write the message broker yourself. There are plenty of <a href="http://openadaptor.org/" target="_blank">open-source</a> and commercial alternatives.</p><h2>Summary</h2><p>Sending email directly from the request-handling thread performs poorly, creates unpredictable page latency for users and risks dropping their emails right on the floor. It's better to drop a message in a queue for asynchronous transformation by a message broker: it's faster, more reliable, and there's less code for you to write. <br /></p> 
]]></content></entry><entry><title>Two Sites, One Antipattern</title><link href="https://michaelnygard.com/blog/2007/12/two-sites-one-antipattern/"/><id>https://michaelnygard.com/blog/2007/12/two-sites-one-antipattern/</id><published>2007-12-20T20:36:19-06:00</published><updated>2007-12-20T20:36:19-06:00</updated><content type="html"><![CDATA[<p>This week, I had Groundhog Day in December.&nbsp; I was visiting two different clients, but they each told the same tale of woe.</p><p>At my first stop, the director of IT told me about a problem they had recently found and eliminated.</p><p>They're a retailer. Like many retailers, they try to increase sales through &quot;upselling&quot; and &quot;cross-selling&quot;. So, when you go to check out, they show you some other products that you might want to buy.&nbsp; It's good to show customers relevant products that are also advantageous to sell. <br />For example, if a customer buys a big HDTV, offer them cables (80% margin) instead of DVDs (3% margin).</p><p>All but one of the slots on that page are filled through deliberate merchandising. People decide what to display there, the same way they decide what to put in the endcaps or next to the register in a physical store. The final slot, though, gets populated automatically according to the products in the customer's cart. Based on the original requirements for the site, the code to populate that slot looked for products in the catalog with similar attributes, then sorted through them to find the &quot;best&quot; product.&nbsp; (Based on some balance of closely-matched attributes and high margin, I suspect.)<br /></p><p>The problem was that there were too many products that would match.&nbsp; The attributes clustered too much for the algorithm, so the code for this slot would pull back thousands of products from the catalog.&nbsp; It would turn each row in the result set into an object, then weed through them in memory.</p><p>Without that slot, the page would render in under a second.&nbsp; With it, two minutes, or worse.<br /></p><p>It had been present for more than two years. You might ask, &quot;How could that go unnoticed for two years?&quot; Well, it didn't, of course. But, because it had always been that way, most everyone was just used to it. When the wait times would get too bad, this one guy would just restart app servers until it got better.</p><p>Removing that slot from the page not only improved their stability, it vastly increased their capacity. Imagine how much more they could have added to the bottom line if they <em>hadn't </em>overspent for the last two years to compensate.&nbsp;</p><p>At my second stop, the site suffered from serious stability problems. At any given time, it was even odds that at least one app server would be vapor locked. Three to five times a day, that would ripple through and take down all the app servers. One key symptom was a sudden spike in database connections.</p><p>Some nice work by the DBAs revealed a query from the app servers that was taking <em>way</em> too long. No query from a web app should ever take more than half a second, but this one would run for 90 seconds or more. Usually that means the query logic is bad.&nbsp; In this case, though, the logic was OK, but the query returned 1.2 million rows. The app server would doggedly convert those rows into objects in a Vector, right up until it started thrashing the garbage collector. Eventually, it would run out of memory, but in the meantime, it held a lot of row locks.&nbsp; All the other app servers would block on those row locks.&nbsp; The team applied a band-aid to the query logic, and those crashes stopped.</p><p>What's the common factor here? It's what I call an &quot;Unbounded Result Set&quot;.&nbsp; Neither of these applications limited the amount of data they requested, even though there certainly were limits to how much they could process.&nbsp; In essence, both of these applications trusted their databases.&nbsp; The apps weren't prepared for the data to be funky, weird, or oversized. They assumed too much.<br /></p><p>You should make your apps be paranoid about their data. &nbsp; If your app processes one record at a time, then looping through an entire result set might be OK---as long as you're not making a user wait while you do.&nbsp; But if your app that turns rows into objects, then it had better be very selective about its SELECTs.&nbsp; The relationships might not be what you expect.&nbsp; The data producer might have changed in a surprising way, particularly if it's not under your control.&nbsp; Purging routines might not be in place, or might have gotten broken.&nbsp; Definitely don't trust some other application or batch job to load your data in a safe way.</p><p>No matter what odd condition your app stumbles across in the database, it should not be vulnerable.</p> 
]]></content></entry><entry><title>Read-write splitting with Oracle</title><link href="https://michaelnygard.com/blog/2007/12/read-write-splitting-with-oracle/"/><id>https://michaelnygard.com/blog/2007/12/read-write-splitting-with-oracle/</id><published>2007-12-12T11:07:49-06:00</published><updated>2007-12-12T11:07:49-06:00</updated><content type="html"><![CDATA[<p>Speaking of <a href="/blog/2007/12/budgetecture-and-its-ugly-cous/">databases</a> and <a href="/blog/2007/11/two-ways-to-boost-your-flaggin/">read/write splitting</a>, Oracle had a session at OpenWorld about it.</p><p>Building a read pool of database replicas isn't something I usually think of doing with Oracle, mainly due to their non-zero license fees.&nbsp; It changes the scaling equation.</p><p>Still, if you are on Oracle and the fees work for you, consider Active Data Guard.&nbsp;&nbsp; Some key facts from the slides:</p><ul><li>Average latency for replication was 1 second</li><li>The maximum latency spike they observed was 10 seconds.</li><li>A node can take itself offline if it detects excessive latency.</li><li>You can use DBLinks to allow applications to think they're writing to a read node.&nbsp; The node will transparently pass the writes through to the master.</li><li>This can be done without any tricky JDBC proxies or load-balancing drivers, just the normal Oracle JDBC driver with the bugs we all know and love.</li><li>Active Data Guard requires Oracle 11g. <br /></li></ul> 
]]></content></entry><entry><title>Budgetecture and it's ugly cousins</title><link href="https://michaelnygard.com/blog/2007/12/budgetecture-and-its-ugly-cousins/"/><id>https://michaelnygard.com/blog/2007/12/budgetecture-and-its-ugly-cousins/</id><published>2007-12-12T09:38:17-06:00</published><updated>2007-12-12T09:38:17-06:00</updated><content type="html"><![CDATA[<p>It's the time of year for family gatherings, so here's a repulsive group portrait of some nearly universal pathologies. Try not to read this while you're eating. <br /></p><h2>Budgetecture <br /></h2><p>We've all been hit with budgetecture.&nbsp; That's when sound technology choices go out the window in favor of cost-cutting. The conversation goes something like this.</p><p>&quot;Do we <em>really need</em> X?&quot; asks the project sponsor. (A.k.a. the gold owner.)</p><p>For &quot;X&quot;, you can substitute nearly anything that's vitally necessary to make the system run: software licenses, redundant servers, offsite backups, or power supplies.&nbsp; It's always asked with a sort of paternalistic tone, as though the grown-up has caught us blowing all our pocket money on comic books and bubble gum, whilst the serious adults are trying to get on with buying more buckets to carry their profits around in.</p><p>The correct way to answer this is &quot;Yes.&nbsp; We do.&quot;&nbsp; That's almost never the response.</p><p>After all, we're trained as engineers, and engineering is all about making trade-offs. We know good and well that you don't really <em>need</em> extravagances like power supplies, so long as there's a sufficient supply of hamster wheels and cheap interns in the data center.&nbsp; So instead of simply saying, &quot;Yes. We do,&quot; we go on with something like, &quot;Well, you could do without a second server, provided you're willing to accept downtime for routine maintenance and whenever a RAM chip gets hit by a cosmic ray and flips a bit, causing a crash, but if we get error-checking parity memory then we get around that, so we just have to worry about the operating system crashing, which it does about every three-point-nine days, so we'll have to institute a regime of nightly restarts that the interns can do whenever they're taking a break from the power-generating hamster wheels.&quot;</p><p>All of which might be completely true, but is utterly the wrong thing to say. The sponsor has surely stopped listening after the word, &quot;Well...&quot;</p><p>The problem is that you see your part as an engineering role, while your sponsor clearly understands he's engaged in a negotiation. And in a negotiation, the last thing you want to do is make concessions on the first demand. In fact, the right response to the &quot;do we really need&quot; question is something like this:</p><p>&quot;Without a second server, the whole system will come crashing down at least three times daily, particularly when it's under heaviest load or when you are doing a demo for the Board of Directors. In fact, we really need four servers so we can take an HA pair down independently at any time while still maintaining 100% of our capacity, even in case one of the remaining pair crashes unexpectedly.&quot;</p><p>Of course, we both know you don't really need the third and fourth servers. This is just a gambit to get the sponsor to change the subject to something else. You're upping the ante and showing that you're already running at the bare, dangerous, nearly-irresponsible minimum tolerable configuration. And besides, if you do actually get the extra servers, you can certainly use one to make your QA environment match production, and the other will make a great build box.</p><h2>Schedule Quid Pro Quo <br /></h2><p>Another situation in which we harm ourselves by bringing engineering trade-offs to a negotiation comes when the schedule slips. Statistically speaking, we're more likely to pick up the bass line from &quot;La Bamba&quot; from a pair of counter-rotating neutron stars than we are to complete a project on time. Sooner or later, you'll realize that the only way to deliver your project on time and under budget is to reduce it to roughly the scope of &quot;Hello, world!&quot;</p><p>When that happens, being a responsible developer, you'll tell your sponsor that the schedule needs to slip. You may not realize it, but by uttering those words, you've given the international sign of negotiating weakness.</p><p>Your sponsor, who has his or her own reputation---not to mention budget---tied to the delivery of this project, will reflexively respond with, &quot;We can move the date, but if I give you that, then you have to give me these extra features.&quot;</p><p>The project is already going to be late. Adding features will surely make it more late, particularly since you've already established that the team isn't moving as fast as expected. So why would someone invested in the success of the project want to further damage it by increasing the scope? It's about as productive as soaking a grocery store bag (the paper kind) in water, then dropping a coconut into it.</p><p>I suspect that it's sort of like dragging a piece of yarn in front of a kitten. It can't help but pounce on it. It's just what kittens do.</p><p>&nbsp;My only advice in this situation is to counter with data. Produce the burndown chart showing when you will actually be ready to release with the current scope. Then show how the fractally iterative cycle of slippage followed by scope creep produces a delivery date that will be moot, as the sun will have exploded before you reach beta.</p><h2>The Fallacy of Capital</h2><p>When something costs a lot, we want to use it all the time, regardless of how well suited it is or is not.</p><p>This is sort of the inverse of budgetecture.&nbsp; For example, relational databases used to cost roughly the same as a battleship. So, managers got it in their heads that everything needed to be in <em>the</em> relational database.&nbsp; Singular. As in, one.</p><p>Well, if one database server is the source of all truth, you'd better be pretty careful with it. And the best way to be careful with it is to make sure that nobody, but nobody, ever touches it. Then you collect a group of people with malleable young minds and a bent toward obsessive-compulsive abbreviation forming, and you make them the Curators of Truth.</p><p>But, because the damn thing cost so much, you need to get your money's worth out of it. So, you mandate that every application must store its data in The Database, despite the fact that nobody knows where it is, what it looks like, or even if it really exists.&nbsp; Like Schrodinger's cat, it might already be gone, it's just that nobody has observed it yet. Still, even that genetic algorithm with simulated annealing, running ten million Monte Carlo fitness tests is required to keep its data in The Database.<br /></p><p>(In the above argument, feel free to substitute IBM Mainframe, WebSphere, AquaLogic, ESB, or whatever your capital fallacy du jour may be.)</p><p>Of course, if databases didn't cost so much, nobody would care how many of them there are. Which is why MySQL, Postgres, SQLite, and the others are really so useful. It's not an issue to create twenty or thirty instances of a free database. There's no need to collect them up into a grand &quot;enterprise data architecture&quot;. In fact, exactly the opposite is true. You can finally let independent business units evolve independently. Independent services can own their own data stores, and never let other applications stick their fingers into its guts. </p><p>&nbsp;</p><p>So there you have it, a small sample of the rogue's gallery. These bad relations don't get much photo op time with the CEO, but if you look, you'll find them lurking in some cubicle just around the corner.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Releasing a free SingleLineFormatter</title><link href="https://michaelnygard.com/blog/2007/12/releasing-a-free-singlelineformatter/"/><id>https://michaelnygard.com/blog/2007/12/releasing-a-free-singlelineformatter/</id><published>2007-12-08T19:14:38-06:00</published><updated>2007-12-08T19:14:38-06:00</updated><content type="html"><![CDATA[<p>A number of readers have asked me for reference implementations of the stability and capacity patterns.</p>
<p>I've begun to create some free implementations to go along with <a href="https://pragprog.com/titles/mnee2/">Release It</a>.  As of today, it just includes a drop-in formatter that you can use in place of the <tt>java.util.logging</tt> default (which is horrible). </p>
<p>This formatter keeps all the fields lined up in columns, including truncating the logger name and method name if necessary.  A columnar format is much easier for the human eye to scan. We all have great pattern-matching machinery in our heads. I can't for the life of me understand why so many vendors work so hard to defeat it. The one thing that doesn't get stuffed into a column is a stack trace. It's good for a stack trace to interrupt the flow of the log file... that's something that you really want to pop out when scanning the file.</p>
<p>It only takes a minute to plug in the SingleLineFormatter. Your admins will thank you for it.</p>
<p><a href="/glb/index.html">Read about the library.</a></p>
<p>Download it as <a href="/glb/ReleaseItCode.zip">.zip</a> or <a href="/glb/ReleaseItCode.tgz">.tgz</a>.</p>
 
]]></content></entry><entry><title>A Dozen Levels of Done</title><link href="https://michaelnygard.com/blog/2007/11/a-dozen-levels-of-done/"/><id>https://michaelnygard.com/blog/2007/11/a-dozen-levels-of-done/</id><published>2007-11-28T16:09:59-06:00</published><updated>2007-11-28T16:09:59-06:00</updated><content type="html"><![CDATA[<p>What does &quot;done&quot; mean to you?&nbsp; I find that my definition of &quot;done&quot; continues to expand. When I was still pretty green, I would say &quot;It's done&quot; when I had finished coding.&nbsp; (Later, a wiser and more cynical colleague taught me that &quot;done&quot; meant that you had not only finished the work, but made sure to tell your manager you had finished the work.)</p><p>The next meaning of &quot;done&quot; that I learned had to do with version control. It's not done until it's checked in.</p><p>Several years ago, I got <a target="_blank" href="http://c2.com/cgi/wiki?TestInfected">test infected</a> and my definition of &quot;done&quot; expanded to include unit testing. </p><p>Now that I've <a href="https://pragprog.com/titles/mnee2/">lived in operations</a> for a few years and gotten to know and love <a target="_blank" href="http://www.amazon.com/gp/product/0321150783?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0321150783">Lean Software Development</a>, I have a new definition of &quot;done&quot;.</p><p>Here goes:</p><p>A feature is not &quot;done&quot; until all of the following can be said about it:</p><ol><li>All unit tests are green.</li><li>The code is as simple as it can be.</li><li>It communicates clearly.</li><li>It compiles in the automated build from a clean checkout.</li><li>It has passed unit, functional, integration, stress, longevity, load, and resilience testing.</li><li>The customer has accepted the feature.<br /></li><li>It is included in a release that has been branched in version control.</li><li>The feature's impact on capacity is well-understood.<br /></li><li>Deployment instructions for the release are defined and do not include a &quot;point of no return&quot;.</li><li>Rollback instructions for the release are defined and tested.</li><li>It has been deployed and verified.</li><li>It is generating revenue.</li></ol><p>Until all of these are true, the feature is just unfinished inventory.<br /></p> 
]]></content></entry><entry><title>Postmodern Programming</title><link href="https://michaelnygard.com/blog/2007/11/postmodern-programming/"/><id>https://michaelnygard.com/blog/2007/11/postmodern-programming/</id><published>2007-11-19T08:00:00-06:00</published><updated>2007-11-19T08:00:00-06:00</updated><content type="html"><![CDATA[<p>It's taken me a while to get to this talk. Not because it was uninteresting, just because it sent my mind in so many directions that I needed time to collect my scattered thoughts.</p><h2>Objects and Lego Blocks&nbsp;</h2><p>On Thursday, James Noble delivered a Keynote about &quot;The Lego Hypothesis&quot;. As you might guess, he was talking about the dream of building software as easily as a child assembles a house from Lego bricks. He described it as an old dream, using quotes from the very first conference on Software Engineering... the one where they utterly invented the term &quot;Software Engineering&quot; itself.&nbsp; In 1968.</p><p>The Lego Hypothesis goes something like this: &quot;In the future, software engineering will be set free from the mundane necessity of programming.&quot; To realize this dream, we should look at the characteristics of Lego bricks and see if software at all mirrors those characteristics.</p><p>Noble ascribed the following characteristics to components:</p><ul><li>Small</li><li>Indivisible</li><li>Substitutable</li><li>More similar than different</li><li>Abstract encapsulations</li><li>Coupled to a few, close neighbors</li><li>No action at a distance</li></ul><p>(These actually predate that 1968 software engineering conference by quite a bit. They were first described by the Greek philosopher Democritus in his theory of <em>atomos</em>.)</p><p>The first several characteristics sound a lot like the way we understand objects. The last two are problematic, though.</p><p>Examining many different programs and languages, <a href="http://elvis.ac.nz/" target="_blank">Noble's research group</a> has found that objects are typically not connected to just a few nearby objects. The majority of objects are coupled to just one or two others. But the extremal cases are very, very extreme. In a Self program, one object had over 10,000,000 inbound references. That is, it was coupled to more than 10,000,000 other objects in the system. (It's probably 'nil', 'true', 'false', or perhaps the integer object 'zero'.)</p><p>In fact, object graphs tend to form scale-free networks that can be described by <a href="http://www.elvis.ac.nz/brain?ObjectPowerLaws">power laws</a>.</p><p>Lots of other systems in our world form scale-free networks with power law distributions:</p><ul><li>City sizes</li><li>Earthquake magnitudes</li><li>Branches in a roadway network</li><li>The Internet</li><li>Blood vessels</li><li>Galaxy sizes</li><li>Impact crater diameters</li><li>Income distributions</li><li>Books sales</li></ul><p>One of the first things to note about power law distributions is that they are not normal. That is, words like &quot;average&quot; and &quot;median&quot; are very misleading. If the average inbound coupling is 1.2, but the maximum is 10,000,000, how much does the average tell you about the large scale behavior of the system?</p><p>(An aside: this is the fundamental problem that makes random events so problematic in Nassim Taleb's book <a href="http://www.amazon.com/gp/product/1400063515?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1400063515" target="_blank">The Black Swan</a>. Benoit Mandelbrot also considers this in <a href="http://www.amazon.com/gp/product/0465043550?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0465043550" target="_blank">The (Mis)Behavior of Markets</a>. Yes, <a href="http://en.wikipedia.org/wiki/Mandelbrot_set" target="_blank"><em>that</em></a> Mandelbrot.)<br /></p><p>Noble made a pretty good case that the Lego Hypothesis is dead as disco. Then came a leap of logic that I must have missed.</p><h2>Postmodernism</h2><p>&quot;The ultimate goal of computer science is the program.&quot;</p><p>You are assigned to write a program to calculate the first 100 prime numbers. If you are a student, you have to write this as if it exists in a vacuum. That is, you code as if this is the first program in the universe. It isn't. Once you leave the unique environs of school, you're not likely to sit down with pad of lined paper and a mechanical pencil to derive your own prime-number-finding algorithm. Instead, your first stop is probably Google. </p><p>Searching for &quot;<a href="http://www.google.com/search?hl=en&amp;q=prime+number+sieve&amp;btnG=Google+Search" target="_blank">prime number sieve</a>&quot; currently gives me about 644,000 results in three-tenths of a second. The results include implementations in JavaScript, Java, C, C++, FORTRAN, PHP, and many others. In fact, if I really need prime numbers rather than a program to find numbers, I can just parasitize somebody else's computing power with online prime number generators.</p><p>Noble quotes Steven Conner from the Cambridge Companion to Postmodernism:</p><p style="margin-left: 40px">&quot;...that condition in which, for the first time, and as a result of technologies which allow the large-scale storage, access, and re-production of records of the past, the past appears to be included in the present.&quot;</p><p>In art and literature, postmodernism incorporates elements of past works, directly and by reference. In programming, it means that every program ever written is <span style="font-style: italic">still alive</span>. They are &quot;alive&quot; in the sense that even dead hardware can be emulated. Papers from the dawn of computing are available online. There are execution environments for COBOL that run in Java Virtual Machines, possibly on virtual operating systems. Today's systems can completely contain every previous language, program, and execution environment.</p><p>I'm now writing well beyond my actual understanding of postmodern critical theory and trying to report what Noble was talking about in his keynote.</p><p>The same technological changes that caused the rise of postmodernism in art, film, and literature are now in full force in programming. In a very real sense, we did it to ourselves! We technologists and programmers created the technology---globe-spanning networks, high compression codecs, indexing and retrieval, collaborative filtering, virtualization, emulation---that are now reshaping our profession.</p><p>In the age of postmodern programming, there are no longer &quot;correct algorithms&quot;. Instead, there are contextual decisions, negotiations, and contingencies. Instead of The Solution, we have individual solutions that solve problems in a context. This should sound familiar to anyone in the patterns movement.</p><p>Indeed, he directly references patterns and eXtreme Programming as postmodern programming phenomena, along with &quot;scrap-heap&quot; programming, mashups, glue programming, and scripting languages.<br /></p><p>I searched for a great way to wrap this piece up, but ultimately it seemed more appropriate to talk about the contextual impact it had on me. I've never been fond of postmodernism; it always seemed simultaneously precious and pretentious. Now, I'll be giving that movement more attention. Second, I've always thought of mashups as sort of tawdry and sordid---not <span style="font-style: italic">real</span> programming, you know? I'll be reconsidering that position as well.&nbsp;</p> 
]]></content></entry><entry><title>Conference: "Velocity"</title><link href="https://michaelnygard.com/blog/2007/11/conference-velocity/"/><id>https://michaelnygard.com/blog/2007/11/conference-velocity/</id><published>2007-11-16T10:19:56-06:00</published><updated>2007-11-16T10:19:56-06:00</updated><content type="html"><![CDATA[<p>O'Reilly has announced an upcoming conference called Velocity.</p><p>From the announcement:</p><blockquote><p>Web companies, big and small, face many of the same challenges: sites must be faster, infrastructure needs to scale, and everything must be available to customers at all times, no matter what. Velocity is the place to obtain the crucial skills and knowledge to build successful web sites that are fast, scalable, resilient, and highly available.</p><p>Unfortunately, there are few opportunities to learn from peers, exchange ideas with experts, and share best practices and lessons learned.</p><p>Velocity is changing that by providing the best information on building and operating web sites that are fast, reliable, and always up. We're bringing together people from around the world who are doing the best performance work, to improve the experience of web users worldwide. Pages will be faster. Sites will have higher up-time. Companies will achieve more with less. The next cool startup will be able to more quickly scale to serve a larger audience, globally. Velocity is the key for crossing over from cool Web 2.0 features to sustainable web sites.</p></blockquote>    <p>That statement could have been the preface to <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">my book</a>, so I'll be submitting several proposals for talks.<br /></p> 
]]></content></entry><entry><title>Putting My Mind Online</title><link href="https://michaelnygard.com/blog/2007/11/putting-my-mind-online/"/><id>https://michaelnygard.com/blog/2007/11/putting-my-mind-online/</id><published>2007-11-13T15:44:44-06:00</published><updated>2007-11-13T15:44:44-06:00</updated><content type="html"><![CDATA[<p>Along with the longer analysis pieces, I've decided to post the entirety of my notes from QCon San Francisco. A few of my friends and colleagues are fellow mind-mappers, so this is for them.</p><p>Nygard's Mind Map from QCon</p><p>This file works with <a href="http://freemind.sourceforge.net/wiki/index.php/Main_Page" target="_blank">FreeMind</a>, an fast, fluid, and free mind mapping tool.&nbsp;</p> 
]]></content></entry><entry><title>Two Ways To Boost Your Flagging Web Site</title><link href="https://michaelnygard.com/blog/2007/11/two-ways-to-boost-your-flagging-web-site/"/><id>https://michaelnygard.com/blog/2007/11/two-ways-to-boost-your-flagging-web-site/</id><published>2007-11-10T00:52:56-06:00</published><updated>2007-11-10T00:52:56-06:00</updated><content type="html"><![CDATA[<p>Being fast doesn't make you scalable. But it does mean you can handle more capacity   with your current infrastructure. Take a look at this diagram of request handlers.</p>

<p align="center"><img style="float:none;" width="242" height="296" border="0" title="13 Threads Needed When Requests Take 700ms" alt="13 Threads Needed When Requests Take 700ms" src="/images/blog/two_ways/Active_Threads_Slow_Requests.png" /></p>

<p>You can see that it takes 13 request handling threads to process this amount of   load. In the next diagram, the requests arrive at the same rate, but in this picture it   takes just 200 milliseconds to answer each one.</p>

<p align="center"><img style="float:none;" width="242" height="298" border="0" title="3 Threads Needed When Requests Take 200ms" alt="3 Threads Needed When Requests Take 200ms" src="/images/blog/two_ways/Active_Threads_Fast_Requests.png" /></p>

<p>Same load, but only 3 request handlers are needed at a time. So, shortening the   processing time means you can handle more transactions during the same unit of time.</p>

<p>Suppose you're site is built on the classic &quot;six-pack&quot; architecture shown below. As your traffic grows and the site slows, you're probably looking at adding more oomph   to the database servers. Scaling that database cluster up gets expensive very quickly.   Worse, you have to bulk up both guns at once, because each one still has to be able to   handle the entire load. So you're paying for big boxes that are guaranteed to be 50%   idle.</p>

<p align="center"><img style="float:none;" width="101" height="354" border="0" title="Classic Six Pack" alt="Classic Six Pack" src="/images/blog/two_ways/Six_Pack.png" /></p>

<p>Let's look at two techniques almost any site can use to speed up requests, without   having the Hulk Hogan and Andre the Giant of databases lounging around in your data center.</p>

<h2>Cache Farms</h2>
<p>Cache farming doesn't mean armies of Chinese gamers stomping rats and making vests. It doesn't involve registering a ton of domain names, either.</p>

<p>Pretty much every web app is already caching a bunch of things at a bunch of layers.   Odds are, your application is already caching database results, maybe as objects or maybe   just query results. At the top level, you might be caching page fragments. HTTP session objects are nothing but caches. The net result of all this caching is a lot of redundancy. Every app server instance has a bunch of memory devoted to caching. If you're running multiple instances on the same hosts, you could be caching the same object once   per instance.</p>

<p>Caching is supposed to speed things up, right? Well, what happens when those app   server instances get short on memory? Those caches can tie up a lot of heap space. If they do, then instead of speeding things up, the caches will actually slow responses down   as the garbage collector works harder and harder to free up space.</p>

<p>So what do we have? If there are four app instances per host, then a frequently accessed object---like a product featured on the home page---will be duplicated eight   times. Can we do better? Well, since I'm writing this article, you might suspect the answer   is &quot;yes&quot;. You'd be right.</p>

<p>The caches I've described so far are in-memory, internal caches. That is, they exist completely in RAM and each process uses its own RAM for caching. There exist products, commercial and open-source, that let you externalize that cache. By moving the cache out   of the app server process, you can access the same cache from multiple instances,   reducing duplication. Getting those objects out of the heap, You can make the app server   heap smaller, which will also reduce garbage collection pauses. If you make the cache   distributed, as well as external, then you can reduce duplication even further.</p>

<p>External caching can also be tweaked and tuned to help deal with &quot;hot&quot; objects. If you   look at the distribution of accesses by ID, odds are you'll observe a power law. That means the popular items will be requested hundreds or thousands of times as often as the average item. In a large infrastructure, making sure that the hot items are on cache   servers topologically near the application servers can make a huge difference in time   lost to latency and in load on the network.</p>

<p>External caches are subject to the same kind of invalidation strategies as internal caches. On the other hand, when you invalidate an item from each app server's internal   cache, they're probably all going to hit the database at about the same time. With an   external cache, only the first app server hits the database. The rest will find that it's   already been re-added to the cache.</p>  <p>External cache servers can run on the same hosts as the app servers, but they are   often clustered together on hosts of their own. Hence, the cache farm.</p>

<p align="center"><img style="float:none;" width="240" height="354" border="0" title="Six Pack With Cache Farm" alt="Six Pack With Cache Farm" src="/images/blog/two_ways/Six_Pack_with_Cache_Farm.png" /></p>

<p>If the external cache doesn't have the item, the app server hits the database as   usual. So I'll turn my attention to the database tier.</p>

<h2>Read Pools</h2>
<p>The toughest thing for any database to deal with is a mixture of read and write operations. The write operations have to create locks and, if transactional, locks across multiple tables or blocks. If the same tables are being read, those reads will have highly variable performance, depending on whether a read operation randomly encounters   one of the locked rows (or pages, blocks, or tables, depending).</p>

<p>But the truth is that your application almost certainly does more reads than writes, probably to an overwhelming degree. (Yes, there are some domains where writes exceed reads, but I'm going to momentarily disregard mindless data collection.) For a travel site, the ratio will be about 10:1. For a commerce site, it will be from 50:1 to 200:1.   There are a lot of variables here, especially when you start doing more effective caching, but even then, the ratios are highly skewed.</p>

<p>When your database starts to get that middle-age paunch and it just isn't as zippy as it used to be, think about offloading those reads. At a minimum, you'll be able to scale out instead of up. Scaling out with smaller, consistent, commodity hardware pleases   everyone more than forklift upgrades. In fact, you'll probably get more performance out   of your writes once all that pesky read I/O is off the write master.</p>

<p>How do you create a read pool? Good news! It uses nothing more than built-in replication features of the database itself. Basically, you just configure the write   master to ship its archive logs (or whatever your DB calls them) to the read pool   databases. They spin up the logs to bring their state into synch with the write master.</p>

<p align="center"><img style="float: none;" width="316" height="407" border="0" title="Six Pack With Cache Farm and Read Pool" alt="Six Pack With Cache Farm and Read Pool" src="/images/blog/two_ways/Six_Pack_with_Read_Pool.png" /></p>

<p>By the way, for read pooling, you really want to avoid database clustering approaches. The overhead needed for synchronization obviates the benefits of read pooling in the first place.</p>

<p>At this point, you might be objecting, &quot;Wait a cotton-picking minute! That means the read machines are garun-damn-teed to be out of date!&quot; (That's the Foghorn Leghorn version of the objection. I'll let you extrapolate the Tony Soprano and Geico Gecko versions yourself.) You would   be correct. The read machines will always reflect an earlier point in time.</p>

<p>Does that matter?</p>

<p>To a certain extent, I can't answer that. It might matter, depending on your domain and application. But in general, I think it matters less often than it seems. I'll give you an example from the retail domain that I know and love so well. Take a look at this product detail page from BestBuy.com. How often do you think each data field on that page changes? Suppose there is a pricing error that needs to be corrected <em>immediately</em> (for some definition of immediately.) What's the total latency before that pricing error   will be corrected? Let's look at the end-to-end process.</p>

<ol>
<li>A human detects the pricing error.</li>
<li>The observer notifies the responsible merchant.</li>
<li>The merchant verifies that the price is in error and determines the correct price.</li>
<li>Because this is an emergency, the merchant logs in to the &quot;fast path&quot; system that bypasses the nightly batch cycle.</li>
<li>The merchant locates the item and enters the correct price</li>
<li>She hits the &quot;publish&quot; button.</li>
<li>The fast path system connects to the write master in production and updates the price.</li>
<li>The read pool receives the logs with the update and applies them.</li>
<li>The read pool process sends a message to invalidate the item in the app servers' caches.</li>
<li>The next time users request that product detail page, they see the correct price.</li>
</ol>

<p>That's the best-case scenario! In the real world, the merchant will be in a meeting when the pricing error is found. It may take a phone call or lookup from another database   to find out the correct price. There might be a quick conference call to make the   decision whether to update the price or just yank the item off the site. All in all, it might take an hour or two before the pricing error gets corrected. Whatever the exact sequence of events, odds are that the replication latency from the write master to the read pool is the very least of the delays.</p>

<p>Most of the data is much less volatile or critical than the price. Is an extra five minutes of latency really a big deal? When it can save you a couple of hundred thousand dollars on giant database hardware?</p>

<h2>Summing It Up</h2>

<p>The reflexive answer to scaling is, &quot;Scale out at the web and app tiers, scale up in the data tier.&quot;  I hope this shows that there are other avenues to improving   performance and capacity.</p>

<h3>References</h3>

<p>For more on read pooling, see Cal Henderson's excellent book, "Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applications".</p>

<p>The most popular open-source external caching framework I've seen is <a href="http://www.danga.com/memcached/" target="_blank">memcached</a>. It's a flexible, multi-lingual caching daemon.</p>

<p>On the commercial side, <a href="https://www.gigaspaces.com" target="_blank">GigaSpaces</a> provides distributed, external, clustered caching. It adapts to the &quot;hot item&quot; problem dynamically to keep a good distribution of traffic, and it can be configured to move cached items closer to the servers that use   them, reducing network hops to the cache.</p>
]]></content></entry><entry><title>Two Quick Observations</title><link href="https://michaelnygard.com/blog/2007/11/two-quick-observations/"/><id>https://michaelnygard.com/blog/2007/11/two-quick-observations/</id><published>2007-11-09T14:13:41-06:00</published><updated>2007-11-09T14:13:41-06:00</updated><content type="html"><![CDATA[<p>Several of the speakers here have echoed two themes about databases.<br /></p><p>1. MySQL is in production in a lot of places. I think the high cost of commercial databases (read: Oracle) leads to a kind of budgetechture that concentrates all data in a single massive database. If you remove that cost from the equation, the idea of either functionally partitioning your data stores or creating multiple shards becomes much more palatable.</p><p>2. By far the most common database cluster structure has one write master with many read masters. Ian Flint spoke to us about the architectures behind Yahoo Groups and Yahoo Bix. Bix has 30 MySQL read servers and just one write master. Dan Pritchett from eBay had a similar ratio. (His might have been 10:1 rather than 30:1.) In a commerce site, where 98% of the traffic is browsing and only 2% is buying, a read-pooled cluster makes a lot of sense.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Three Vendors Worth Evaluating</title><link href="https://michaelnygard.com/blog/2007/11/three-vendors-worth-evaluating/"/><id>https://michaelnygard.com/blog/2007/11/three-vendors-worth-evaluating/</id><published>2007-11-09T11:11:48-06:00</published><updated>2007-11-09T11:11:48-06:00</updated><content type="html"><![CDATA[<p>Several vendors are sponsoring QCon. (One can only wonder what the registration fees would be if they didn't.) Of these, I think three have products worth immediate evaluation.</p><h2>Semmle</h2><p>In the category of &quot;really cool, but would I pay for it?&quot; is <a target="_blank" href="http://semmle.com/">Semmle</a>. Their flagship product, SemmleCode, lets you treat your codebase as a database against which you can run queries. SemmleCode groks the static structure of your code, including relationships and dependencies. Along the way, it calculates pretty much every OO metric yet invented. It also looks at the source repository.</p><p>What can you do with it? Well, you can create a query that shows you all the cyclic dependencies in your code. The results can be rendered as a tree with explanations, a graph, or a chart. Or, you can chart your distribution of cyclomatic complexity scores over time. You can look for the classes or packages most likely to create a ripple effect.</p><p>Semmle ships with a sample project: the open-source drawing framework JHotDraw. In a stunning coincidence, I'm a contributor to JHotDraw. I wrote the glue code that uses Batik to export a drawing as SVG. So I can say with confidence, that when Semmle showed all kinds of cyclic dependencies in the exporters, it's absolutely correct. Every one of the queries I saw run against JHotDraw confirmed my own experience with that codebase. Where Semmle indicated difficulty, I had difficulty. Where Semmle showed JHotDraw had good structure, it was easy to modify and extend.<br /></p><p>There are an enormous number of things you could do with this, but one thing they currently lack is build-time automation. Semmle integrates with Eclipse, but not ANT or Maven. I'm told that's coming in a future release.</p><h2>3Tera</h2><p>Virtualization is a hot topic. VMWare has the market lead in this space, but I'm very impressed with 3Tera's AppLogic.</p><p>AppLogic takes virtualization up a level.&nbsp; It lets you visually construct an entire infrastructure, from load balancers to databases, app servers, proxies, mail exchangers, and everything. These are components they keep in a library, just like transistors and chips in a circuit design program.</p><p>Once you've defined your infrastructure, a single button click will deploy the whole thing into the grid OS. And there's the rub. AppLogic doesn't work with just any old software and it won't work on top of an existing &quot;traditional&quot; infrastructure.</p><p>As a comparison, HP's SmartFrog just runs an agent on a bunch of Windows, Linux, or HP-UX servers. A management server sends instructions to the agents about how to deploy and configure the necessary software. So SmartFrog could be layered on top of an existing traditional infrastructure.</p><p>Not so with AppLogic. You build a grid specifically to support this deployment style. That makes it possible to completely virtualize load balancers and firewalls along with servers. Of course, it also means complete and total lock-in to 3tera.</p><p>Still, for someone like a managed hosting provider, 3tera offers the fastest, most complete definition and provisioning system I've seen.</p><h2>GigaSpaces</h2><p>What can I say about <a href="http://www.gigaspaces.com/" target="_blank">GigaSpaces</a>? Anyone who's heard me speak knows that I adore tuple-spaces. GigaSpaces is a tuple-space in the same way that Tibco is a pub-sub messaging system. That is to say, the foundation is a tuple-space, but they've added high-level capabilities based on their core transport mechanism.</p><p>So, they now have a distributed caching system.&nbsp; (They call it an &quot;in-memory data grid&quot;. Um, OK.) There's a database gateway, so your front end can put a tuple into memory (fast) while a back-end process takes the tuple and writes it into the database.</p><p>Just this week, they announced that their entire stack is free for startups. (Interesting twist: most companies offer the free stuff to open-source projects.) They'll only start charging you money when you get over $5M in revenue.&nbsp;</p><p>I love the technology. I love the architecture.</p> 
]]></content></entry><entry><title>Catching up through the day</title><link href="https://michaelnygard.com/blog/2007/11/catching-up-through-the-day/"/><id>https://michaelnygard.com/blog/2007/11/catching-up-through-the-day/</id><published>2007-11-09T11:03:36-06:00</published><updated>2007-11-09T11:03:36-06:00</updated><content type="html"><![CDATA[<p>One of the great things about virtual infrastructure is that you can treat it as a service. I use Yahoo's shared hosting service for this blog. That gives me benefits: low cost and very quick setup. On the down side, I can't log in as root. So when Yahoo has a problem, I have a problem.</p><p>Yesterday, there was something wrong with Yahoo's install of Movable Type. As a result, I couldn't post my &quot;five things&quot;. I'll be catching up today, as time permits.</p><p>My butt is planted in one track all day today, &quot;Architectures You've Always Wondered About.&quot; We'll be hearing about the architecture that runs Second Life, Yahoo, eBay, LinkedIn, and Orbitz. I may need a catheter and an IV.<br /></p> 
]]></content></entry><entry><title>Architecting for Latency</title><link href="https://michaelnygard.com/blog/2007/11/architecting-for-latency/"/><id>https://michaelnygard.com/blog/2007/11/architecting-for-latency/</id><published>2007-11-09T09:42:43-06:00</published><updated>2007-11-09T09:42:43-06:00</updated><content type="html"><![CDATA[<p>Dan Pritchett, Technical Fellow at eBay, spoke about &quot;Architecting for Latency&quot;. His aim was not to talk about minimizing latency, as you might expect, but rather to architect as though you believe latency is unavoidable and real.</p><p>We all know the effect latency can have on performance. That's the first-level effect. If you consider synchronous systems---such as SOAs or replicated DR systems---then latency has a dramatic effect on scalability as well. Whenever a synchronous call reaches across a long wire, the latency gets added directly to the processing time.</p><p>For example, if client A calls service B, then A's processing time will be <em>at least</em> the sum of B's processing time, plus the latency between A and B. (Yes, it seems obvious when you state it like that, but many architectures still act as though latency is zero.)</p><p>Furthermore, latency over IP networks is fundamentally variable. That means A's performance is unpredictable, and can never be made predictable.</p><p>Latency also introduces semantic problems. A replicated database will always have some discrepancy with the master database. A functionally or horizontally partitioned system will either allow discrepancies or must serialize traffic and give up scaling. You can imagine that eBay is much more interested in scaling than serializing traffic.</p><p>For example, when a new item is posted to eBay, it does not immediately show up in the search results. The ItemNode service posts a message that eventually causes the item to show up in search results. Admittedly, this is kept to a very short period of time, but still, the item will reach different data centers at different times. So, the search service inside the nearest data center will get the item before the search service inside the farthest. I suspect many eBay users would be shocked, and probably outraged, to hear that shoppers see different search results depending on where they are.</p><p>Now, the search service is designed to get consistent within a limited amount of time---<em>for that item</em>. With a constant flow of items, being posted from all over the country, you can imagine that there is a continuous variance among the search services. Like the quantum foam, however, this is near-impossible to observe. One user cannot see it, because a user gets pinned to a single data center. It would take multiple users, searching in the right category, with synchronized clocks, taking snapshots of the results to even observe that the discrepancies happen. And even then, they would only have a <em>chance</em> of seeing it, not a certainty.&nbsp;</p><p>Another example. Dan talked about payment transfer from one user to another. In the traditional model, that would look something like this.</p><p align="center"><img style="float:none;" width="281" height="304" border="0" title="Synchronous write to both databases" alt="Synchronous write to both databases" src="/images/blog/latency/Zero_Latency_Assumption.png" /></p><p>You can think of the two databases as being either shards that contain different users or tables that record different parts of the transaction.</p><p>This is a design that pretends latency doesn't exist. In other words, it subscribes to Fallacy #2 of the <a href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing" target="_blank">Fallacies of Distributed Computing</a>. Performance and scalability will suffer here.<br />  </p><p>(Frankly, it has an availability problem, too, because the availability of the payment service Pr(Payment) will now be Pr(Database 1) * Pr(Network 1) * Pr(Database 2) * Pr(Network 2). In other words, the availability of the payment service is coupled to the availability of Database 1, Database 2, and the two networks connecting Payment to the two databases.)<br /></p><p>Instead, Dan recommends a design more like this:</p><p align="center"><img style="float: none;" width="309" height="467" border="0" src="/images/blog/latency/Architected_for_Latency.png" alt="Payment service with back end reconciliation" title="Payment service with back end reconciliation" /></p><p align="left">In this case, the payment service can set an expectation with the user that the money will be credited within some number of minutes. The availability and performance of the payment service is now independent from that of Database 2 and the reconciliation process. Reconciliation happens in the back end. It can be message-driven, batch-driven, or whatever. The main point is that it is decoupled in time and space from the payment service. Now the databases can exist in the same data center or on separate sides of the globe. Either way, the performance, availability, and scalability characteristics of the payment service don't change. That's architecting for latency.</p><p align="left">Instead of ACID semantics, we think about BASE semantics. (A cute retronym for <strong>B</strong>asically <strong>A</strong>vailable <strong>S</strong>oft-state <strong>E</strong>ventually-consistent.)&nbsp;</p><p align="left">Now, many analysts, developers, and business users will object to the loss of global consistency. We heard a spirited debate last night between Dan and Nati Shalom, founder and CTO of GigaSpaces about that very subject.</p><p align="left">I have two arguments of my own to support architecting for latency.</p><p align="left">First, any page you show to a user represents a point-in-time snapshot. It can always be inaccurate even by the time you finish generating the page. Think about a commerce site saying &quot;Ships in 2 - 3 days&quot;. That's true at the instant when the data is fetched. Ten milliseconds later, it might not be true. By the time you finish generating the page and the user's browser finishes rendering it (and fetching the 29 JavaScript files needed for the whizzy AJAX interface) the data is already a few seconds old. So global consistency is kind of useless in that case, isn't it? Besides, I can guarantee there's already a large amount of latency in the data path from inventory tracking to the availability field in the commerce database anyway.<br /></p><p align="left">Second, the cost of global consistency is global serialization. If you assume a certain volume of traffic you must support, the cost of a globally consistent solution will be a <em>multiple</em> of the cost of a latency-tolerant solution. That's because global consistency can only be achieved by making a single master system of record. When you try to reach large scales, that master system of record is going to be hideously expensive.</p><p align="left">Latency is simply an immutable fact of the universe. If we architect for it, we can use it to our advantage. If we ignore it, we will instead be its victims.<br /></p><p align="left">For more of Dan's thinking about latency, see <a href="http://www.infoq.com/articles/pritchett-latency" target="_blank">his article</a> on <a href="http://www.infoq.com/" target="_blank">InfoQ.com</a>.&nbsp;</p> 
]]></content></entry><entry><title>SOA Without the Edifice</title><link href="https://michaelnygard.com/blog/2007/11/soa-without-the-edifice/"/><id>https://michaelnygard.com/blog/2007/11/soa-without-the-edifice/</id><published>2007-11-08T08:38:59-06:00</published><updated>2007-11-08T08:38:59-06:00</updated><content type="html"><![CDATA[<p>Sometimes the best interactions at a conference aren't the talks, they're shouting. An crowded bar with an over-amped DJ may seem like an unlikely place for a discussion on SOA. Even so, when it's Jim Webber, ThoughtWorks' SOA practice lead doing the shouting, it works. Given that Jim's topic is &quot;Guerilla SOA&quot;, shouting is probably more appropriate than the hushed and reverential cathedral whispers that usually attend SOA discussions.</p><p>Jim's attitude is that SOA projects tend to attract two things: Taj Mahal architects and parasitic vendors. (My words, not Jim's.) The combined efforts of these two groups results in monumentally expensive edifices that don't deliver value. Worse still, these efforts consume work and attention that could go to building services aligned with the real business processes, not some idealized vision of what the processes ought to be.</p><p>Jim says that services should be aligned with business processes. When the business process changes, change the service. (To me, this automatically implies that the service cannot be owned by some enterprise governance council.) When you retire the business process, simply retire the service.</p><p>These sound like such common sense, that it's hard to imagine they could be controversial.</p><p>I'll be in the front row for Jim's talk later today.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Cameron Purdy: 10 Ways to Botch Enterprise Java Scalability and Reliability</title><link href="https://michaelnygard.com/blog/2007/11/cameron-purdy-10-ways-to-botch-enterprise-java-scalability-and-reliability/"/><id>https://michaelnygard.com/blog/2007/11/cameron-purdy-10-ways-to-botch-enterprise-java-scalability-and-reliability/</id><published>2007-11-07T22:08:52-06:00</published><updated>2007-11-07T22:08:52-06:00</updated><content type="html"><![CDATA[<p>Here at QCon, well-known Java developer Cameron Purdy gave a fun talk called &quot;10 Ways to Botch Enterprise Java Scalability and Reliability&quot;.&nbsp; (He also gave this talk at JavaOne.)</p><p>While I could quibble with Cameron's counting---there were actually more like 16 points thanks to some numerical overloading---I liked his content.&nbsp; He echoes many of the antipatterns from <a href="https://pragprog.com/titles/mnee2/">Release It</a>. &nbsp; In particular, he talks about the problem I call &quot;Unbounded Result Sets&quot;.&nbsp; That is, whether using an ORM tool or straight database queries, you can always get back more than you expect.&nbsp; </p><p>Sometimes, you get back way, <em>way</em> more than you expect. I once saw a small messaging table, that normally held ten or twenty rows, grow to over ten million rows.&nbsp; The application servers never contemplated there could be so many messages.&nbsp; Each one would attempt to fetch the entire contents of the table and turn them into objects.&nbsp; So, each app server would run out of memory and crash.&nbsp; That rolled back the transaction, allowing the next app server to impale itself on the same table.</p><p>Unbounded Result Sets don't just happen from &quot;SELECT * FROM FOO;&quot;, though.&nbsp; Think about an ORM handling the parent-child relationship for you.&nbsp; Simply calling something like customer.getOrders() will return every order for that customer.&nbsp; By writing that call, you implicitly assume that the set of orders for a customer will always be small.&nbsp; Maybe.&nbsp; Maybe not.&nbsp; How about blogUser.getPosts()?&nbsp; Or tickerSymbol.getTrades()?</p><p>Unbounded Result Sets also happen with web services and SOAs.&nbsp; A seemingly innocuous request for information could create an overwhelming deluge---an avalanche of XML that will bury your system.&nbsp; At the least, reading the results can take a long time.&nbsp; In the worst case, you will run out of memory and crash.</p><p>The fundamental flaw with an Unbounded Result Set is that you are trusting someone else not to harm you, either a data producer or a remote web service.&nbsp; </p><p>Take charge of your own safety!&nbsp; </p><p>Be defensive! </p><p>Don't get hurt again in another dysfunctional relationship!<br /></p> 
]]></content></entry><entry><title>Three Programming Language Problems Solved Forever</title><link href="https://michaelnygard.com/blog/2007/11/three-programming-language-problems-solved-forever/"/><id>https://michaelnygard.com/blog/2007/11/three-programming-language-problems-solved-forever/</id><published>2007-11-07T21:38:41-06:00</published><updated>2007-11-07T21:38:41-06:00</updated><content type="html"><![CDATA[<p>It's often been the case that a difficult problem can be made easier by transforming it into a different representation.&nbsp; Nowhere is that more true than in mathematics and the pseudo-mathematical realm of programming languages.</p><p>For example, LISP, Python, and Ruby all offer beautiful and concise constructs for operating on lists of things.&nbsp; In each of them, you can make a function which iterates across a list, performing some operation on each element, and returning the resulting list.&nbsp; C, C++, and Java do not offer any similar construct.&nbsp; In each of these languages, iterating a list is a control-flow structure that requires multiple lines to express.&nbsp; More significantly, the function expression of list comprehension can be <em>composed</em>. That is, you can embed a list comprehension structure inside of another function call or list operation.&nbsp; In reading <a href="https://www.amazon.com/gp/product/0596529325?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0596529325">Programming Collective Intelligence</a>, which uses Python as its implementation language, I've been amazed at how eloquent complex operations can be, especially when I mentally transliterate the same code into Java.</p><p>In the evening keynote at QCon, <a href="http://en.wikipedia.org/wiki/Richard_Gabriel" target="_blank">Richard Gabriel</a> covered 50 language topics, with a 50 word statement about each---along with a blend of music, art, and poetry. (If you've never seen Richard perform at a conference, it's quite an experience.)&nbsp; His presentation &quot;50 in 50&quot; also covered 50 years of programming and introduced languages as diverse as COBOL, SNOBOL, Piet, LISP, Perl, C, Algol, APL, IPL, Befunge, and HQ9+.</p><p>HQ9+ particularly caught my attention.&nbsp; It takes the question of &quot;simplifying the representation of problems&quot; to the utmost extreme.</p><p>HQ9+ has a simple grammar.&nbsp; There are 4 operations, each represented by a single character.<br /></p><p>'+' increments the register.</p><p>'H' prints every languages natal example, &quot;Hello, world!&quot;&nbsp;</p><p>'Q' makes every program into a <a href="http://en.wikipedia.org/wiki/Quine_%28computing%29">quine</a>.&nbsp; It causes the interpreter to print the program text.&nbsp; Quines are notoriously difficult assignments for second-year CS students.</p><p>'9' causes the interpreter to print the lyrics to the song &quot;99 Bottles of Beer on the Wall.&quot;&nbsp; This qualifies HQ9+ as a&nbsp; real programming language, suitable for inclusion in the <a href="http://99-bottles-of-beer.net/">ultimate list of languages</a>.</p><p>These three operators solve for some very commonly expressed problems.&nbsp; In a certain sense, they are the ultimate solution to those problem.&nbsp; They cannot be reduced any further... you can't get shorter than one character. <br /></p><p>Of course, in an audience of programmers, HQ9+ always gets a laugh.&nbsp; In fact, it was created specifically to make programmers laugh.&nbsp; And, in fact, it's a kind of meta-level humor. It's not the programs that are funny, but the design of the language itself... an inside joke from one programmer to the rest of us.<br /></p> 
]]></content></entry><entry><title>Eric Evans: Strategic Design</title><link href="https://michaelnygard.com/blog/2007/11/eric-evans-strategic-design/"/><id>https://michaelnygard.com/blog/2007/11/eric-evans-strategic-design/</id><published>2007-11-07T21:23:57-06:00</published><updated>2007-11-07T21:23:57-06:00</updated><content type="html"><![CDATA[<p>Eric Evans, author of <a target="_blank" href="https://www.amazon.com/gp/product/0321125215?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0321125215">Domain-Driven Design</a> and founder of <a target="_blank" href="http://www.domainlanguage.com/">Domain Language</a>, embodies the philosophical side of programming.</p><p>He gave a wonderful talk on &quot;Strategic Design&quot;.&nbsp; During this talk, he stated a number of maxims that are worth pondering.</p><p><strong>&quot;Not all of a large system will be well designed.&quot;</strong></p><p><strong>&quot;There are always multiple models.&quot;</strong></p><p><strong>&quot;The diagram is not the model, but it is an expression of part of the model.&quot;</strong></p><p>These are not principles to be followed, Evans says. Rather, these are fundamental laws of the universe. We must accept them and act accordingly, because disregarding them ends in tears.</p><p>Much of this material comes from Part 4 of Domain-Driven Design.&nbsp; Evans laconically labeled this, &quot;The part no one ever gets to.&quot;&nbsp; Guilty.&nbsp; But when I get back home to my library, I will make another go of it.</p><p>Evans also discusses the relative size of code, amount of time spent, and value of the three fundamental portions of a system: the core domain, supporting subdomains, and generic subdomains.</p><p>Generic subdomains are horizontal. You might find these in any system in any company in the world.</p><p>Supporting subdomains are business-specific, but not of value to this particular system. That is, they are necessary cost, but do not provide value.</p><p>The core domain is the reason for the system. It is the business-specific functionality that makes this system worth building.</p><p>Now, in a typical development process (and especially a rewrite project), where does the team's time go? Most of it will go to the largest bulk: the generic subdomains. This is the stuff that has to exist, but it adds no value and is not specific to the company's business. The next largest fraction goes to the supporting subdomains. Finally, the smallest portion of time---and usually the last portion of time---goes to the core domain.</p><p>That means the very last thing delivered is the reason for the system's existance in the first place.&nbsp; Ouch.&nbsp;</p> 
]]></content></entry><entry><title>Kent Beck's Keynote: "Trends in Agile Development"</title><link href="https://michaelnygard.com/blog/2007/11/kent-becks-keynote-trends-in-agile-development/"/><id>https://michaelnygard.com/blog/2007/11/kent-becks-keynote-trends-in-agile-development/</id><published>2007-11-07T12:50:28-06:00</published><updated>2007-11-07T12:50:28-06:00</updated><content type="html"><![CDATA[<p>Kent Beck spoke with his characteristic mix of humor, intelligence, and empathy.&nbsp; Throughout his career, Kent has brought a consistently humanistic view of development.&nbsp; That is, software is written by humans--emotional, fallible, creative, and messy--for other humans.&nbsp; Any attempt to treat development as robotic will end in tears.</p><p>During his keynote, Kent talked about engaging people through appreciative inquiry.&nbsp; This is a learnable technique, based in human psychology, that helps focus on positive attributes.&nbsp; It counters the negaitivity that so many developers and engineers are prone to.&nbsp; (My take: we spend a lot of time, necessarily, focusing on how things can go wrong.&nbsp; Whether by nature or by experience, that leads us to a pessimistic view of the world.)</p><p>Appreciative inquiry begins by asking, &quot;What do we do well?&quot;&nbsp; Even if all you can say is that the garbage cans get emptied every night, that's at least something that works well.&nbsp; Build from there.</p><p>He specifically recommended <a type="amzn">The Thin Book of Appreciative Inquiry</a>, which I've already ordered.</p>
<p>I should also note that Kent has a new book out, called <a type="amzn">Implementation Patterns</a>, which he described as being about, &quot;Communicating with other people, through code.&quot;</p> 
]]></content></entry><entry><title>From QCon San Francisco</title><link href="https://michaelnygard.com/blog/2007/11/from-qcon-san-francisco/"/><id>https://michaelnygard.com/blog/2007/11/from-qcon-san-francisco/</id><published>2007-11-07T12:39:10-06:00</published><updated>2007-11-07T12:39:10-06:00</updated><content type="html"><![CDATA[<p>I'm at QCon San Francisco this week.&nbsp; (An aside: after being a speaker at <a href="http://www.nofluffjuststuff.com/">No Fluff, Just Stuff</a>, it's interesting to be the audience again.&nbsp; As usual, on returning from travels in a different domain, one has a new perspective on familiar scenes.) This conference targets senior developers, architects, and project managers.&nbsp; One of the very appealing things is the track on &quot;Architectures you've always wondered about&quot;.&nbsp; This coveres high-volume architectures for sites such as <a href="http://www.linkedin.com/">LinkedIn</a> and <a href="http://www.ebay.com">eBay</a> as well as other networked applications like <a href="http://www.secondlife.com/">Second Life</a>.&nbsp; These applications live and work in thin air, where traffic levels far outstrip most sites in the world.&nbsp; Performance and scalability are two of my personal themes, so I'm very interested in learning from these pioneers about what happens when you've blown past the limits of traditional 3-tier, app-server centered architecture.</p><p>Through the remainder of the week, I'll be blogging five ideas, insights, or experiences from each day of the conference.</p> 
]]></content></entry><entry><title>Pragmatic Podcast</title><link href="https://michaelnygard.com/blog/2007/10/pragmatic-podcast/"/><id>https://michaelnygard.com/blog/2007/10/pragmatic-podcast/</id><published>2007-10-26T16:44:27-05:00</published><updated>2007-10-26T16:44:27-05:00</updated><content type="html"><![CDATA[<p>Has anyone ever been happy to listen to their own voice?  Probably not.</p>
<p><a href="https://pragprog.com">The Pragmatic Podcast</a> is up and running on the redesigned <a href="https://pragprog.com">Pragmatic Programmers</a> site.  In the first episode, Daniel Steinberg interviews me about <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0978739213">the book</a>.</p>
<p>Also available on iTunes.</p>

 
]]></content></entry><entry><title>Make Time a Weapon</title><link href="https://michaelnygard.com/blog/2007/10/make-time-a-weapon/"/><id>https://michaelnygard.com/blog/2007/10/make-time-a-weapon/</id><published>2007-10-23T21:50:30-05:00</published><updated>2007-10-23T21:50:30-05:00</updated><content type="html"><![CDATA[<p>Here's an list of books about putting time to work as your own weapon, instead of being victimized by it:</p>
<ul>
<li>&nbsp;<a type="amzn">The Mind of War</a></li>
<li>&nbsp;<a type="amzn">The Art of Maneuver</a></li>
<li>&nbsp;<a type="amzn">Lean Thinking</a></li>
<li>&nbsp;<a type="amzn">Birth of the Chaordic Age</a></li>
<li>&nbsp;<a type="amzn">Agile Software Development</a></li>
<li>&nbsp;<a type="amzn">Lean Software Development</a></li>
<li>&nbsp;<a type="amzn">Software by Numbers</a></li>
</ul> 
]]></content></entry><entry><title>Normal Accidents</title><link href="https://michaelnygard.com/blog/2007/09/normal-accidents/"/><id>https://michaelnygard.com/blog/2007/09/normal-accidents/</id><published>2007-09-24T10:27:53-05:00</published><updated>2007-09-24T10:27:53-05:00</updated><content type="html"><![CDATA[<p>While I was writing <a type="amzn" asin="0978739213">Release It!</a>, I was influenced by James R. Chile's book <a type="amzn" asin="0066620821">Inviting Disaster</a>. One of Chile's sources is <a type="amzn" asin="0691004129">Normal Accidents</a>, by Charles Perrow. I've just started reading, and even the first two pages offer great insight.</p><p>Normal Accidents describes systems that are inherently unstable, to the point that system failures are inevitable and should be expected.&nbsp; These &quot;normal&quot; accidents result from systems that exhibit the characteristics of high &quot;interactive complexity&quot; and &quot;tight coupling&quot;.</p><p>Interactive complexity refers to internal linkages, hidden from the view of operators. These invisible relations between components or subsystems produce multiple effects from a single cause.&nbsp; They can also produce outcomes that do not seem to relate to their inputs.</p><p>In software systems, interactive complexity is endemic. Any time two programs share a server or database, they are linked. Any time a system contains a feedback loop, it inherently has higher interactive complexity. Feedback loops aren't always obvious.&nbsp; For example, suppose a new software release consumes a fraction more CPU per transaction than before. That small increment might puch the server from a non-contending regime and a contending one.&nbsp; Once in contention, the added CPU usage creates more latency. That latency, and the increase in task-switching overhead, produces more latency. Positive feedback.</p><p>High interactive complexity leads operators to misunderstand the system and its warning signs. Thus misinformed, they act in ways that do not avert the crisis and may actually precipitate it.&nbsp;</p><p>When processes happen very fast, and there is no way to isolate one part of the system from another, the system is tightly coupled.&nbsp; Tight coupling allows small incidents to spread into large-scale failures.</p><p>Classic &quot;web architecture&quot; exhibits both high interactive complexity and tight coupling. Hence, we should expect &quot;normal&quot; accidents.&nbsp; Uptime will be dominated by the occurence of these accidents, rather than the individual probability of failure in each component.</p><p>The first section of <a type="amzn" asin="0978739213">Release It!</a> deals exclusively with system stability.&nbsp; It shows how to reduce coupling and diminish interactive complexity.&nbsp;</p> 
]]></content></entry><entry><title>You Keep Using That Word. I Do Not Think It Means What You Think It Means.</title><link href="https://michaelnygard.com/blog/2007/09/you-keep-using-that-word.-i-do-not-think-it-means-what-you-think-it-means./"/><id>https://michaelnygard.com/blog/2007/09/you-keep-using-that-word.-i-do-not-think-it-means-what-you-think-it-means./</id><published>2007-09-16T10:35:17-05:00</published><updated>2007-09-16T10:35:17-05:00</updated><content type="html"><![CDATA[<p>&quot;Scalable&quot; is a tricky word. We use it like there's one single definition. We speak as if it's binary: this architecture is scalable, that one isn't.</p><p>The first really tough thing about scalability is finding a useful definition.  Here's the one I use: </p><blockquote><p>Marginal revenue / transaction &gt; Marginal cost / transaction</p></blockquote><p>The cost per transaction has to account for all cost factors: bandwidth, server capacity, physical infrastructure, administration, operations, backups, and the cost of capital. </p><p>(And, by the way, it's even better when the ratio of revenue to cost per transaction  <em>grows </em> as the volume increases.)</p><p>The second really tough thing about scalability and architecture is that there isn't one that's  <strong>right</strong>.&nbsp; An architecture may work perfectly well for a range of transaction volumes, but fail badly as one variable gets large.</p><p>Don't treat &quot;scalability&quot; as either a binary issue or a moral failing. Ask instead, &quot;how far will this architecture scale before the marginal cost deteriorates relative to the marginal revenue?&quot; Then, follow that up with, &quot;What part of the architecture will hit a scaling limit, and what can I incrementally replace to remove that limit?&quot;</p> 
]]></content></entry><entry><title>Engineering in the White Space</title><link href="https://michaelnygard.com/blog/2007/09/engineering-in-the-white-space/"/><id>https://michaelnygard.com/blog/2007/09/engineering-in-the-white-space/</id><published>2007-09-13T12:59:27-05:00</published><updated>2007-09-13T12:59:27-05:00</updated><content type="html"><![CDATA[<p><strong>&quot;Is software Engineering, or is it Art?&quot;</strong></p><p>Debate between the Artisans and the Engineers has simmered, and occasionally boiled, since the very introduction of the phrase &quot;Software Engineering&quot;.&nbsp; I won't restate all the points on both sides here, since I would surely forget someone's pet argument, and also because I see no need to be redundant.</p><p>Deep in my heart, I believe that building programs is art and architecture, but not engineering.</p><p>But, what if you're not just building programs?</p><p><strong>Programs and Systems</strong></p><p>A &quot;program&quot; has a few characteristics that I'll assign here:</p><ol><li>It accepts input.</li><li>It produces output.</li><li>It runs a sequence of instructions.</li><li>Statically, it exhibits cohesion in its executable form. [*]<br /></li><li>Dynamically, it exhibits cohesion in its address space. [**]<br /></li></ol><p>* That is, the transitive closure of all code to be executed is finite, although it may not all be known in advance of execution.&nbsp; This allows dynamic extension via plugins, but not, for example, dynamic execution of any scripts or code found on the Web.&nbsp; So, a web browser is a program, but Javascript executed on some page is an independent program, not part of the browser itself.</p><p>** For &quot;address space&quot;, feel free to substitute &quot;object space&quot;, &quot;process space&quot;, or &quot;virtual memory&quot;. Cohesion requires that all the code that can access the address space should be regarded as a single program.&nbsp; (IPC through shared memory is a special case of an output, and should be considered more akin to a database or memory-mapped file than to part of the program's own address space.)<br /></p><p>Suppose you have two separate scripts that each manipulate the same database.&nbsp; I would regard those as two separate---though not independent---programs.&nbsp; A single instance of Tomcat may contain several independent programs, but all the servlets in one EAR file are part of one program.</p><p>For the moment, I will not consider trivial objections, such as two distinct sets of functionality that happen to be packaged and delivered in a single EAR file.&nbsp; It's less interesting to me whether code <span style="font-style: italic">does </span>access the entire address space then whether it <span style="font-style: italic">could</span>.&nbsp; A library checkout program that includes functions for both librarians and patrons may not use common code for card number lookup, but it could.&nbsp; (And, arguably, it should.)&nbsp; That makes it one program, in my eyes.</p><p>A &quot;System&quot;, on the other hand, consists of interdependent programs that have commonalities in their inputs and outputs.&nbsp; They could be arranges in a chain, a web, or a loop.&nbsp; No matter, if one program's input depends on another program's output, then they are part of a system.</p><p>Systems can be composed, whereas programs cannot. &nbsp;</p><p style="font-weight: bold">Tricky White Space</p><p>Some programs run all the time, responding to intermittent inputs, these we call &quot;servers&quot;.&nbsp; It is very common to see servers represented as a deceptively simple little rectangle on a diagram.&nbsp; Between servers, we draw little arrows to indicate communication, of some sort.</p><p>One little arrow might mean, &quot;Synchronous request/reply using SOAP-XML over HTTP.&quot; That's quite a lot of information for one little glyph to carry.&nbsp; There's not usually enough room to write all that, so we label the unfortunate arrow with either &quot;XML over HTTP&quot;---if viewing it from an internal perspective---or &quot;SKU Lookup&quot;---if we have an external perspective.</p><p>That little arrow, bravely bridging the white space between programs, looks like a direct contact.&nbsp; It is Voyager, carrying its recorded message to parts unknown.&nbsp; It is Aricebo, blasting a hopeful greeting into the endless dark.</p><p>Well, not really...</p><p>These days, the white space isn't as empty as it once was.&nbsp; A kind of lumeniferous ether fills the void between servers on the diagram.</p><p><strong>The Substrate</strong></p><p>There is many a slip 'twixt cup and lip.&nbsp; In between points A and B on our diagram, there exist some or all of the following:</p><ul><li>Network interface cards</li><li>Network switches</li><li>Layer 2 - 3 firewalls</li><li>Layer 7 (application) firewalls</li><li>Intrusion Detection and Prevention Systems</li><li>Message queues</li><li>Message brokers</li><li>XML transformation engines</li><li>Flat file translations</li><li>FTP servers</li><li>Polling jobs</li><li>Database &quot;landing zone&quot; tables</li><li>ETL scripts</li><li>Metro-area SoNET rings</li><li>MPLS gateways</li><li>Trunk lines</li><li>Oceans</li><li>Ocean liners</li><li>Phillipine fishing trawlers (see, &quot;Underwater Cable Break&quot;)</li></ul><p>Even in the simple cases, there will be four or five computers between program A and B, each running their own programs to handle things like packet switching, traffic analysis, routing, threat analysis, and so on.</p><p>I've seen a single arrow, running from one server to another, labelled &quot;Fulfillment&quot;.&nbsp; It so happened that one server was inside my client's company while the other server was in a fulfillment house's company.&nbsp; That little arrow, so critical to customer satisfaction, really represented a Byzantine chain of events that resembled a game of &quot;Mousetrap&quot; more than a single interface.&nbsp; It had messages going to message brokers that appended lines to files, which were later picked up by an hourly job that would FTP the files to the &quot;gateway&quot; server (still inside my client's company.)&nbsp; The gateway server read each line from the file and constructed and XML message, which it then sent via HTTP to the fulfillment house.</p><p><strong>It Stays Up<br /></strong></p><p>We analogize bridge-building as the epitome of engineering. (Side note: I live in the Twin Cities area, so we're a little leery of bridge engineering right now.&nbsp; Might better find another analogy, OK?)&nbsp; Engineering a bridge starts by examining the static and dynamic load factors that the bridge must support: traffic density, weight, wind and water forces, ice, snow, and so on. </p><p>Bridging between two programs should consider static and dynamic loads, too.&nbsp; Instead of just &quot;SOAP-XML over HTTP&quot;, that one little arrow should also say, &quot;Expect one query per HTTP request and send back one response per HTTP reply.&nbsp; Expect up to 100 requests per second, and deliver responses in less than 250 milliseconds 99.999% of the time.&quot;</p><p><strong>It Falls Down</strong></p><p>Building the right failure modes is vital. The last job of any structure is to fall down well. The same is true for programs, and for our hardy little arrow.</p><p>The interface needs to define what happens on each end when things come unglued. What if the caller sends more than 100 requests per second? Is it OK to refuse them? Should the receiver drop requests on the floor, refuse politely, or make the best effort possible?</p><p>What should the caller do when replies take more than 250 milliseconds? Should it retry the call? Should it wait until later, or assume the receiver has failed and move on without that function?</p><p>What happens when the caller sends a request with version 1.0 of the protocol and gets back a reply in version 1.1? What if it gets back some HTML instead of XML?&nbsp; Or an MP3 file instead of XML?</p><p>When a bridge falls down, it is shocking, horrifying, and often fatal. Computers and networks, on the other hand, fall down all the time.&nbsp; They always will.&nbsp; Therefore, it's incumbent on us to ensure that individual computers and networks fail in predictable ways. We need to know what happens to that arrow when one end disappears for a while.</p><p><strong>In the White Space</strong></p><p>This, then, is the essence of engineering in the white space. Decide what kind of load that arrow must support.&nbsp; Figure out what to do when the demand is more than it can bear.&nbsp; Decide what happens when the substrate beneath it falls apart, or when the duplicitous rectangle on the other end goes bonkers.</p><p>Inside the boxes, we find art.</p><p>The arrows demand engineering.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>On the Widespread Abuse of SLAs</title><link href="https://michaelnygard.com/blog/2007/08/on-the-widespread-abuse-of-slas/"/><id>https://michaelnygard.com/blog/2007/08/on-the-widespread-abuse-of-slas/</id><published>2007-08-06T13:55:45-05:00</published><updated>2007-08-06T13:55:45-05:00</updated><content type="html"><![CDATA[<p>Technical terminology sneaks into common use. Terms such as &quot;bandwidth&quot; and &quot;offline&quot; get used and abused, slowly losing touch with their original meaning. (&quot;Bandwidth&quot; has suffered multiple drifts. It started out in radio, not computer networking, let alone the idea of &quot;personal attention space&quot;.) It is the nature of language to evolve, so I would have no problem with this linguistic drift, if it were not for the way that the mediocre and the clueless clutch to these seemingly meaningful phrases.</p><p>The latest victim of this linguistic vampirism is the &quot;Service Level Agreement&quot;. This term, birthed in IT governance, sounds wonderful. It sounds formal and official.<br /></p><p>An example of the vulgar usage: &quot;I have a five-day SLA.&quot;</p><p>It sounds so very proactive and synergistic and leveraged, doesn't it? Theoretically, it means that we've got an agreement between our two groups; I am your customer and you commit to delivering service within five days.<br /></p><p>A real SLA has important dimensions that I never see addressed with internal &quot;organizational&quot; SLAs.</p><p>First, boundaries. </p><p>When does that five day clock begin ticking? Is it when I submit my request to the queue? Or, is it when someone from your group picks the request up from the queue? If the latter, then how long do requests sit in queue before they get picked up? What's the best case? Worst case? Average?</p><p>When does the clock stop ticking? If you just say, &quot;not approved&quot; or &quot;needs additional detail&quot;, does that meet your SLA? Do I have to resubmit for the next iteration, with a whole new five day clock? Or, does the original five day SLA run through <em>resolution</em> rather than just <em>response</em>?</p><p><strong>An internal SLA must begin with submission into the request queue and end when the request is fully resolved.</strong></p><p>Second, measurement and tracking.</p><p>How often do you meet your internal SLA? 100% of the time? 95% of the time? 50% of the time? Unless you can tell me your &quot;on-time performance&quot;, there's no way for me to have confidence in your SLA.</p><p>How many requests have to be escalated or prioritized in order to meet SLA? Do any non-escalated requests actually get resolved within the alloted time?</p><p>How well does your on-time performance correlate with the incoming workload? If the request volume goes up by 25%, but your on-time performance does not change, then your SLA is too loose.</p><p><strong>An SLA must be tracked and trended. It must be correlated with demand metrics.</strong></p><p>Third, consequences.</p><p>If there is no penalty, then there is no SLA. In fact, the IT Infrastructure Library considers penalties to be the defining characteristic of SLAs. (Of course, ITIL also says that SLAs are only possible with external suppliers, because it is only with external suppliers that you can have a contract.)</p><p>When was the last time that an internal group had its budget dinged for breaking an SLA? What would that even mean? How would the health and performance of the whole company be aided by taking resources away from a unit that already cannot perform?&nbsp; The Theory of Constraints says that you devote more resources to the bottleneck, not less. Penalizing you for breaking SLA probably makes your performance worse, not better.</p><p>(External suppliers are different because a) you're paying them, and b) they have a profit margin. I doubt the same is true for your own internal groups.)</p><p><strong>If there's no penalty, then it's not an SLA.</strong></p><p>Fourth, consent.</p><p>SLAs are defined by joint consent of both the supplier and consumer of the service. As a subscriber to your service, I can make economic judgments about how much to pay for what level of service. You can make economic judgments about how well you can deliver service at the required level for the offered payment. </p><p>When are internal &quot;service level agreements&quot; actually an &quot;agreement&quot;? Never. I always see SLAs being imposed by one group upon all of their subscribers.</p><p><strong>An SLA must be an agreement, not a dictum.</strong></p><p>&nbsp;</p><p>If any of these conditions are not met, then it's not really an SLA. It's just a &quot;best effort response time&quot;. As a consumer, and sometimes victim, of the service, I cannot plan to the SLA time. Rather, I must manage around it. Calling a &quot;best effort response time&quot; an &quot;SLA&quot; is just an attempt to deceive both of us.<br /></p><p>&nbsp;</p> 
]]></content></entry><entry><title>Y B Slow?</title><link href="https://michaelnygard.com/blog/2007/07/y-b-slow/"/><id>https://michaelnygard.com/blog/2007/07/y-b-slow/</id><published>2007-07-25T13:07:54-05:00</published><updated>2007-07-25T13:07:54-05:00</updated><content type="html"><![CDATA[<p>I've long been a fan of the <a target="_blank" href="https://getfirebug.com/">Firebug</a> extension for <a target="_blank" href="https://www.mozilla.com/en-US/firefox/">Firefox</a>.&nbsp; It gives you great visibility into the ebb and flow of browser traffic.&nbsp; It sure beats rolling your own SOCKS proxy to stick between your browser and the destination site.</p>

<p>Now, I have to also endorse <a target="_blank" href="https://developer.yahoo.com/yslow/">YSlow</a> from Yahoo.&nbsp; YSlow adds interpretation and recommendations to Firebug's raw data.</p>

<p>For example, when I point YSlow at <a target="_blank" href="https://www.google.com">www.google.com</a>, here's how it &quot;grades&quot; Google's performance:</p>

<p><img border="1" style="float:none;" title="Google gets an A for performance" alt="Google gets an A for performance" src="https://michaelnygard.com/images/blog/yslow/yslow_google.png"  class="inline"/></p>
<p>Not bad.&nbsp; On the other hand, <a target="_blank" href="https://www.target.com">www.target.com</a> doesn't fare as well.</p>

<p><img style="float:none" width="605" height="397" border="1" align="left" title="Target gets an F for performance" alt="Target gets an F for performance" src="https://www.michaelnygard.com/images/blog/yslow/yslow_target.png" /></p>

<p>Along with the high-level recommendations, YSlow will also tally up the page weight, including a nice breakdown of cached versus non-cached requests and download size.</p><p><img width="605" height="397" border="1" style="float:none" align="left" src="https://www.michaelnygard.com/images/blog/yslow/yslow_target_stats.png" alt="Cache stats for Target.com" title="Cache stats for Target.com" /></p>

<p>There are so many good reasons to use this tool. In <a href="https://pragprog.com/titles/mnee2/">Release It</a>, I spend a lot of time talking about the money companies waste on bloated HTML and unnecessary page requests.&nbsp; Fat pages hurt users and they hurt companies.&nbsp; Users don't want to wait for all your extra whitespace, table-formatting, and shims to download.&nbsp; Companies shouldn't have to pay for all the added, useless bandwidth.&nbsp; YSlow is a great tool to help eliminate the bloat, speed up page delivery, and make happy users.</p>
]]></content></entry><entry><title>The 5 A.M. Production Problem</title><link href="https://michaelnygard.com/blog/2007/06/the-5-a.m.-production-problem/"/><id>https://michaelnygard.com/blog/2007/06/the-5-a.m.-production-problem/</id><published>2007-06-25T12:33:26-05:00</published><updated>2007-06-25T12:33:26-05:00</updated><content type="html"><![CDATA[<p>I've got a <a target="_blank" href="http://www.infoq.com/articles/release-it-five-am">new piece</a> up at InfoQ.com, discussing the limits of unit and functional testing:&nbsp;</p><p>&quot;Functional testing falls short, however, when you want to build software to survive the real world. Functional testing can only tell you what happens when all parts of the system are behaving within specification. True, you can coerce a system or subsystem into returning an error response, but that error will still be within the protocol! If you're calling a method on a remote EJB that either returns &quot;true&quot; or &quot;false&quot; or it throws an exception, that's all it will do. No amount of functional testing will make that method return &quot;purple&quot;. Nor will any functional test case force that method to hang forever, or return one byte per second.</p>  <p>One of my recurring themes in <a href="https://pragprog.com/titles/mnee2/"><em>Release It</em></a> is that every call to another system, without exception, will someday try to kill your application. It usually comes from behavior outside the specification. When that happens, you must be able to peel back the layers of abstraction, tear apart the constructed fictions of &quot;concurrent users&quot;, &quot;sessions&quot;, and even &quot;connections&quot;, and get at what's really happening.&quot;</p> 
]]></content></entry><entry><title>ITIL and Extreme Programming</title><link href="https://michaelnygard.com/blog/2007/05/itil-and-extreme-programming/"/><id>https://michaelnygard.com/blog/2007/05/itil-and-extreme-programming/</id><published>2007-05-20T15:37:30-05:00</published><updated>2007-05-20T15:37:30-05:00</updated><content type="html"><![CDATA[<p>Esther Schindler asked if I'd be willing to post my earlier article on <a href="/blog/2007/05/itil-and-xp/">staying agile in the face of ITIL</a> at CIO.com.&nbsp; How could I say no?&nbsp; The piece is here.<br /></p><p>&nbsp;</p> 
]]></content></entry><entry><title>ITIL and XP</title><link href="https://michaelnygard.com/blog/2007/05/itil-and-xp/"/><id>https://michaelnygard.com/blog/2007/05/itil-and-xp/</id><published>2007-05-06T12:57:14-05:00</published><updated>2007-05-06T12:57:14-05:00</updated><content type="html"><![CDATA[<p>The Agile Manifesto is explicit about it. &quot;We value individuals and interactions over processes and tools.&quot; How should an Agile team---more specifically, an XP team---respond to the <a target="_blank" href="http://www.itil-itsm-world.com/index.htm">IT Infrastructure Library</a> (ITIL), then? After all, ITIL takes seven books just to define the customizable framework for the actual practices. An IT organization usually takes at least seven more binders to define its actual processes.</p><p>Can XP and ITIL coexist in the same building, or is XP just incompatible with ITIL? In short: no.<br /></p><p>ITIL and XP (or agile in general) are not fundamentally incompatible, but there will definitely be an interface between the XP world and the ITIL world. Whether this interface becomes an impedance barrier or not depends entirely on the way that your company chooses to implement ITIL.</p><p>I'll run down the Service Support processes and identify some of the problems I've encountered. (I'm focusing on Service Support because businesses tend to implement these processes first. Few of them get far enough down the road to really attack the Service Delivery processes. It's a shame, because I see a lot of value in the Service Delivery approach.) I will cover the service delivery processes in a future article.</p><h1>Service Desk</h1><p>An effective service desk can be a great asset to any team, including an XP team. Getting accurate feedback on issues your users are having can only benefit your development efforts and ultimately, the users themselves. The key here is to make sure that the service desk is well-prepared to accept responsibility for support calls on your app.</p><p>I strongly recommend that you start working with the service desk at least six weeks before your first application release. If the service desk is mature, they'll have job aids for capturing app support needs. These will provide the minimum initial information needed for the knowledge base. The service desk personnel will augment that knowledge base over time with whatever solutions, rumors, superstitions and folk remedies they come up with. Be sure you have access to the knowledge base, so you can help weed out the &quot;false solutions.&quot;<br /></p><p>You also want to get on the distribution list for ticket reports from the service desk. These will tell you what issues your users are encountering. Commonly recurring or high-impact issues should become cards for consideration in your next iteration. This feeds your interface to the Problem Management process.</p><p>If the service desk is not mature, you haven't prepared them well, or they do not perform resolution for application incidents, you will be looped in as part of the Incident Management process, below. This has some special challenges.</p><h1>Incident Management</h1><p>ITIL defines an &quot;incident&quot; as any disruption to the normal operation of a system or application.&nbsp; This includes bugs, outages, and even &quot;PEBKAC&quot; problems.&nbsp; The Incident Management process begins with notification of an incident.&nbsp; This can be logged by the service desk in response to a user call.&nbsp; It can even be automatically created by a monitoring system.&nbsp; It ends when normal functioning of the system is restored.</p><p>Note that this <em>does not include</em> root cause analysis or correction!&nbsp; Incident Management is all about restoring service.</p><p>Ideally, the service desk handles the entire Incident Management process and your team will not even need to be involved.&nbsp; In less ideal cases, you may be called on to help resolve &quot;novel&quot; incidents--ones that do not have a solution in the service desk's knowledge base.</p><p>When incidents come into the development room, you have some negative forces to deal with. By definition, the incident needs to be resolved expeditiously, making it both interrupt driven and urgent. Therefore, every incident will automatically split a pair and take somebody off their card. This is damaging to flow.</p><p>In worse cases, the entire team may get derailed and start huddling around the incident. Fire-fighting is exciting quadrant I work. It's natural to get a rush from being the hero. The problem is obvious, though.&nbsp; If the entire team is chasing the incident, nobody is making forward progress on the iteration. If you have a large user community or a lot of incidents, you can lose an entire day---or an entire iteration---before you realize it.<br /></p><p>This can be exacerbated if your service desk <strong>never</strong> resolves application support incidents. In such cases, I recommend the &quot;Designated Sacrifice&quot; pattern. Assign one member of the team to handle the &quot;Bat-Phone&quot; calls and be the primary point of contact for incident resolution. This is a crappy job---you get pulled away constantly, can't maintain focus, get almost no card work done---so you'll want to rotate that position frequently. (On the other hand, there is that hero factor that provides some consolation.) Even doing it for one full iteration can be very demoralizing.</p><h1>Problem Management</h1><p>Recurring incidents can be identified as Problems that require correction. This is the job of the Problem Management process.</p><p>Identifying a Problem is often done by the service desk, but it can also come from other quarters. The decision about which Problems require correction often becomes very slow and bureaucratic. This is a process you want to work with very closely. Problem Management typically tolerates a much higher level or outstanding defects than an XP team wants to allow. I've seen teams get chewed out for fixing Problems that weren't scheduled to be addressed for a couple of iterations! Imagine how surreal that meeting feels!</p><p>Problem managers should be encouraged to write cards. Your team should even reserve a fraction of your velocity in each iteration just to handle Problems. You also need to communicate back to the problem managers when Problem cards are completed. Really good Problem Management identifies a few problem states such as &quot;known problem&quot;, &quot;known workaround&quot;, and &quot;known solution&quot;. An XP team will typically move through these states pretty quickly.</p><p>Bear in mind that the ITIL definition of Problem Management is all about oversight, not the actual changes needed to fix the problem.&nbsp; The actual changes are deployed as part of Release Management.</p><h1>Change Management</h1><p>No part of ITIL gives more people cold sweats than Change Management.&nbsp; This is the process that so easily slips into heavyweight bureaucracy or, worse, meaningless CAB meetings.</p><p>Change Management as defined simply means tracking changes, their impact to configuration items, and ensuring that changes are applied in an orderly way.&nbsp; It doesn't have to hurt.</p><p>In reality, however, XP teams will spend a lot of time preparing for change advisory board meetings. Beware: the XP team may get a bad reputation for creating &quot;too much&quot; change.</p><p>I recommend standardizing your change and deployment process. Get into a regular rhythm of releases and deployments so the CAB just knows to expect that every third Tuesday (or whenever), your team will have a deployment. Standardize the deployment mechanics and system impact statement so you can templatize and re-use your change requests. Familiarity will create confidence with the CAB. Constantly showing them change requests they've never seen before will raise their level of scrutiny.</p><p>Failed changes also trigger more scrutiny. Your XP team will have an advantage here, because your rigorous approach to automated testing will reduce the incidence of failed changes, right?<br /></p><h1>Configuration Management<br /></h1><p>Configuration Management is *not* the act of changing configuration items. It's the process for tracking planned, executed, and retired configurations. As you plan each release, you should identify the CIs that will be affected by the release.</p><p>In a well-executed ITIL rollout, configuration management is vital for change management, incident management, the service desk, and release management. In a poorly-executed ITIL rollout, configuration management doesn't exist, or it only addresses servers or network devices.<br /><br />CM should cover servers, network topology, applications, business processes, documentation, and the dependencies among all of them. That way, proposed changes to one CI (e.g., upgrade to front-end firewalls) can be analyzed for its impact. This is CM nirvana, seldom achieved.<br /></p><p>The XP team should have an advantage here again, because you've already broken story cards down to tasks at the beginning of an iteration.  That means you already know which applications and servers will be changed in that iteration.  Roll up a few iterations into a release, and the CIs affected by the release should be well known.</p>

<p>On the other hand, if you've taken XP to its &quot;no documentation&quot; extreme, then you will not have tracked the CIs touched by each iteration.  This underscores a common misinterpretation of XP; it doesn't eschew all documentation, just the documentation that doesn't add value from the customer's perspective.  So, does tracking changes against CIs add value from the customer's perspective?  Not directly, no.  There is an indirect benefit, in that the customer will receive better uptime and performance, but that may seem remote to the team.  The best I can say is that this is one place where you'll have to chalk it up to &quot;necessary overhead&quot;.</p>
<h1>Release Management<br /></h1><p>This is an easy one to integrate with your XP team. Release Management dovetails quite naturally with XP's release planning cycle. Engage early, though, because the ITIL process will likely require longer lead times than your team is used to.<br /></p> 
]]></content></entry><entry><title>Release It holding strong at Amazon</title><link href="https://michaelnygard.com/blog/2007/04/release-it-holding-strong-at-amazon/"/><id>https://michaelnygard.com/blog/2007/04/release-it-holding-strong-at-amazon/</id><published>2007-04-30T13:19:02-05:00</published><updated>2007-04-30T13:19:02-05:00</updated><content type="html"><![CDATA[<p>Well, <a href="https://pragprog.com/titles/mnee2/">Release It</a> continues to hold the #1 spot in Amazon's &quot;Hot New Releases&quot; list for <a target="_blank" href="http://www.amazon.com/gp/new-releases/books/280310/ref=pd_nr_b_nav/103-6242890-7355801?pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_s=left-1&amp;pf_rd_r=0Y1JH5Q6S7B5QM215537&amp;pf_rd_t=2201&amp;pf_rd_p=221591001&amp;pf_rd_i=280309">Design Tools and Techniques</a>.&nbsp; I've even got a couple of five-star reviews... and they weren't written by friends or family.</p> 
]]></content></entry><entry><title>Heads down</title><link href="https://michaelnygard.com/blog/2007/04/heads-down/"/><id>https://michaelnygard.com/blog/2007/04/heads-down/</id><published>2007-04-30T09:44:11-05:00</published><updated>2007-04-30T09:44:11-05:00</updated><content type="html"><![CDATA[<p>I've been quiet lately for a couple of reasons.</p><p>First, I'm thrilled to say that I'm joining the <a href="http://www.nofluffjuststuff.com/" target="_blank">No Fluff, Just Stuff</a> stable of speakers.&nbsp; It's an honor and a pleasure to be invited to keep such company.&nbsp; The flip side is, I'm spending a lot of my free time polishing up my inventory of presentations.&nbsp; More frankly, I'm rebuilding them all with <a href="http://www.apple.com/iwork/keynote/" target="_blank">Keynote</a>.&nbsp; (Brief aside, I'm coming to love Keynote.&nbsp; It has some flaws and annoyances, but the result is worth it!)</p><p>I'll debut the first of these new presentations at OTUG on May 15th.&nbsp; I'll be speaking about &quot;Design for Operations&quot;.&nbsp; The talk will be about 70% from the last part of <a href="https://pragprog.com/titles/mnee2/">Release It</a>, and about 30% original content.&nbsp; OTUG will be giving away a couple of copies of my book, but you have to be there to win!</p><p>Finally, I'm working on an article about performance and capacity management.&nbsp; Most capacity planning work is done entirely within Operations, without much involvement from Development.&nbsp; At the same time, most developers don't have a visceral appreciation for how dramatically the application's efficiency can affect the system's overall profitability.</p><p>This article will show the relationship between application response time, system capacity, and financial success.&nbsp; I'm hoping to include a simulator app for download that you can use to play with different scenarios to see what a dramatic difference 100ms can make.&nbsp;</p> 
]]></content></entry><entry><title>Coach and Team From Same Firm</title><link href="https://michaelnygard.com/blog/2007/04/coach-and-team-from-same-firm/"/><id>https://michaelnygard.com/blog/2007/04/coach-and-team-from-same-firm/</id><published>2007-04-22T15:23:18-05:00</published><updated>2007-04-22T15:23:18-05:00</updated><content type="html"><![CDATA[<p>Is it an antipattern to have a consulting firm provide both the coach and developers?&nbsp; By providing the developers, the firm is motivated to deliver on the project, with coaching as an adjunct.&nbsp; If, instead, the firm provides just the coach, it will be judged by how well the client adopts the process.&nbsp; These two motives can easily conflict.<br /><br />Case in point: at a previous client of mine, my employer was charged with completing the project, using a 50-50 mix of contractors and client developers.&nbsp; My employer, a consulting firm, provided several developers experienced with XP and Scrum, as well as an agile coach.&nbsp; The firm was thus charged with two imperatives: first, deliver the project; second, introduce agile methods within the client.&nbsp; <br /><br />With project success as a requirement, the firm decided to intereview the developers at the outset of the project. The client's developers (rightly) perceived that they were interviewing for their own jobs.&nbsp; This started a negative dynamic that ultimately resulted in 80% attrition among the client's developers.<br /><br />On a pure coaching engagement, the coach would probably have &quot;made do&quot; with whomever the client provided.&nbsp; <br /><br />We delivered all the features, basically on time, with very high quality. Financially speaking, it was a success, generating more orders and more revenue per order than its predecessor.&nbsp; It is harder to say that the engagement as a whole was a success, though.&nbsp; Almost all of the developers were contractors, so the client got their product, but very little adoption of agile methods.<br /><br />Perhaps if the coach and the contract developers had come from different firms, the motivations would not have been as tangled, and more of the client's valuable people would have stayed.&nbsp; The team might not have suffered from the strained, unhealthy environment from the early days of the project.<br /><br />Then again, perhaps not.&nbsp; The client may have been expecting that level of attrition. Maybe that's just to be expected when you trying to bring a random selection of corporate developers over to agile methods, especially if the methods are decreed from above instead of brought upward by grass-roots. Maybe the dynamic would have existed even with a coach that was totally disinterested in the project outcome.</p> 
]]></content></entry><entry><title>Moving Your Home Directory on Leopard</title><link href="https://michaelnygard.com/blog/2007/04/moving-your-home-directory-on-leopard/"/><id>https://michaelnygard.com/blog/2007/04/moving-your-home-directory-on-leopard/</id><published>2007-04-20T17:10:51-05:00</published><updated>2007-04-20T17:10:51-05:00</updated><content type="html"><![CDATA[<p>Since NetInfo Manager is going away under Leopard, we've got a gap in capability. How do you relocate your home directory without the GUI?</p>

<p>There are a few reasons you might want to move your home directory to another volume. For example, you might reinstall your OS frequently. Or, perhaps you just want to keep your data on a bigger disk than the one that came in the machine.  In my case, both.</p>

<p>The venerable NetInfo is being replaced entirely with Directory Services. (Try &quot;man 8 DirectoryServices&quot; for more information.) There's a handy command-line tool you can use to interact with the DirectoryServices.</p>

<p>Let's start by opening up a Terminal window.  (Applications &gt; Utilities &gt; Terminal)  At first, you'll be logged in as yourself, not as root.</p>

<pre>
Last login: Wed Dec 31 18:00:00 on ttyp0
donk:~ mtnygard$ 
</pre>

<p>The first thing is to get out of your home directory, because we're going to delete it in about a minute and a half. Change to the root directory and make yourself into the root user with &quot;sudo&quot;.</p>

<pre>
Last login: Wed Dec 31 18:00:00 on ttyp0
donk:~ mtnygard$ sudo su -
Password:
donk:~ root#
</pre>


<p>Next, fire up &quot;dscl&quot;, the <strong>d</strong>irectory <strong>s</strong>ervices <strong>c</strong>ommand <strong>l</strong>ine. Without arguments, this gives you an interactive, shell-like environment to explore the directory.  It also spews a bunch of help messages.  If you give it &quot;localhost&quot;, then it quietly assumes you wanted to interact with the directory.  </p>

<p>You can list entries, cd around the directory hierarchy, and even create entries or change attributes.</p>

<p>User information is stored under /Local/Users, so we'll cd to that now.</p>

<pre>
donk:~ root# dscl localhost
 > cd /Local/Users
/Local/Users >
</pre>

<p>Now, running &quot;ls&quot; will show you all the users that your machine knows.&nbsp; Try it now.</p>

<pre>
donk:~ root# dscl localhost
 > cd /Local/Users
/Local/Users > ls
_amavisd
_appowner
_appserver
_ard
_calendar
_clamav
_cvs
_cyrus
_eppc
_installer
_jabber
_lp
_mailman
_mcxalr
_mdnsresponder
_mysql
_pcastagent
_pcastserver
_postfix
_qtss
_securityagent
_serialnumberd
_spotlight
_sshd
_svn
_teamsserver
_tokend
_unknown
_update_sharing
_uucp
_windowserver
_www
_xgridagent
_xgridcontroller
daemon
mtnygard
nobody
root
/Local/Users >
</pre>

<p>Holy crap!&nbsp; Who the hell are all these people?</p>

<p>Well, of course, they aren't people.&nbsp; All the usernames starting with an underscore are application IDs.&nbsp; Root, nobody, and daemon are all part of the OS.&nbsp; Once you eliminate them, there should just be the people you've actually created accounts for.&nbsp; If you see any names you don't recognize at this point, this would be a good time to shut off your network connection.</p>

<p>At this point, you could &quot;cd&quot; directly into the entry for your user.&nbsp; It won't show you anything special; users do not have subnodes in the directory.&nbsp; It would set up your context for future commands, limiting them to just that user.&nbsp; In this case, however, we'll stay at /Local/Users and run &quot;cat&quot; on my username.</p>

<pre>
/Local/Users > cat mtnygard
dsAttrTypeNative:_writers_hint: mtnygard
dsAttrTypeNative:_writers_jpegphoto: mtnygard
dsAttrTypeNative:_writers_passwd: mtnygard
dsAttrTypeNative:_writers_picture: mtnygard
dsAttrTypeNative:_writers_realname: mtnygard
dsAttrTypeNative:authentication_authority: ;ShadowHash;
dsAttrTypeNative:generateduid: 7F6A8EDE-63EC-4A34-9391-031A9C77806D
dsAttrTypeNative:gid: 501
dsAttrTypeNative:hint: 
dsAttrTypeNative:home: /Users/mtnygard
dsAttrTypeNative:jpegphoto:
 ffd8ffe0 00104a46 49460001 01000001 00010000 ffdb0043 00020202 ... 7fffd9
dsAttrTypeNative:name: mtnygard
dsAttrTypeNative:passwd: ********
dsAttrTypeNative:picture:
 /Library/User Pictures/Sports/Tennis.tif
dsAttrTypeNative:realname:
 Michael Nygard
dsAttrTypeNative:shell: /bin/bash
dsAttrTypeNative:uid: 501
AppleMetaNodeLocation: /Local/Default
AuthenticationAuthority: ;ShadowHash;
AuthenticationHint: 
GeneratedUID: 7F6A8EDE-63EC-4A34-9391-031A9C77806D
JPEGPhoto:
 ffd8ffe0 00104a46 49460001 01000001 00010000 ffdb0043 00020202 ... 7fffd9
NFSHomeDirectory: /Users/mtnygard
Password: ********
Picture:
 /Library/User Pictures/Sports/Tennis.tif
PrimaryGroupID: 501
RealName:
 Michael Nygard
RecordName: mtnygard
RecordType: dsRecTypeStandard:Users
UniqueID: 501
UserShell: /bin/bash
/Local/Users >
</pre>

<p>Hmm. Seems like it must mean <i>something</i>. This is listing the values of all the attributes of my user profile. It's what I want, but there's a big pile of noise in the middle. That noise is a textual representation of my profile's JPEG.  (I've edited it out of this transcript.) If you scroll up past that, you'll see the attribute of real interest.</p>

<p>The property dsAttrTypeNative:home tells the OS where to find my home directory.</p>

<p>I can change it with dscl's &quot;change&quot; command. The format of change is a little strange because it has to deal with multi-valued properties (as do all of the directory services commands.)</p>

<pre>
/Local/Users > change mtnygard dsAttrTypeNative:home /Users/mtnygard /Volumes/Data/mtnygard
/Local/Users >
</pre>

<p>The first parameter is the object to change, the second parameter is the attribute to change. The third parameter is the old value that you want to replace (multi-valued list for each attribute, remember.) Finally, the fourth parameter is the new value you want to set.</p>

<p>Whew.</p>

<p>Not quite done yet, though. I've given the OS a bogus home directory. There's no such directory as /Volumes/Data/mtnygard yet.</p>

<p>To get there, I have to move my directory from under /Users to the new location. I have to do this as root, but I don't want root to end up owning all my personal stuff. Fortunately, there's a &quot;cp&quot; option for that.</p>

<pre>
donk:~ # cp -Rp /Users/mtnygard /Volumes/Data/
</pre>

<p>Now, we're almost, almost done. Log off and log back on into your roomy new home directory.</p>

<p>Caveats:</p>
<ol><li>I don't know how to do this if you've got a shared directory tree set up.&nbsp; You might have that if you're on a Mac network at work, for example.&nbsp; You should definitely try this at home.</li>
<li>The &quot;cp&quot; command I use will do really funky things if you've got hard links, symlinks, or especially circular symlinks in your home directory.&nbsp; Then again, if you've done that to yourself, you probably know enough Unix to work out your own parameters for &quot;cp&quot;, &quot;tar&quot;, &quot;mv&quot; or &quot;cpio&quot;.</li>
<li>One more thing: I'm not sure if this is from running a developer seed of Leopard, or if it's due to this home directory move technique, but I keep running into permissions problems.  I couldn't automatically install dashboard widgets, for example.  Adium complained that it couldn't create its &quot;sounds&quot; directory.</li>
</ol> 
]]></content></entry><entry><title>What makes a POJO so great, anyway?</title><link href="https://michaelnygard.com/blog/2007/04/what-makes-a-pojo-so-great-anyway/"/><id>https://michaelnygard.com/blog/2007/04/what-makes-a-pojo-so-great-anyway/</id><published>2007-04-15T23:42:04-05:00</published><updated>2007-04-15T23:42:04-05:00</updated><content type="html"><![CDATA[<p>My friend David Hussman once said to me, &quot;The next person that says the word 'POJO' to me is going to get stabbed in the eye with a pen.&quot;&nbsp; At the time, I just commiserated about people who follow crowds rather than making their own decisions.</p><p>David's not a violent person.&nbsp; He's not prone to fits of violence or even hyperbole.&nbsp; What made this otherwise level-headed coach and guru resort to non-approved uses of a Bic?</p><p>This weekend in No Fluff, Just Stuff, I had occasion to contemplate POJOs again.&nbsp; There were many presentations about &quot;me too&quot; web frameworks.&nbsp; These are the latest crop of Java web frameworks that are furiously copying Ruby on Rails features as fast as they can.&nbsp; These invariably make a big deal out of using POJOs for data-mapped entities or for the beans accessed by whatever flavor of page template they use. (See JSF, Seam, WebFlow, Grails, and Tapestry 5 for examples.)<br /></p><p>Mainly, I think the infuriating bit is the use of the word &quot;POJO&quot; as if it's a synonym for &quot;good&quot;.&nbsp; There's nothing inherently virtuous about plain old Java objects.&nbsp; It's a retronym; a name made up for an old thing to distinguish it from the inferior new replacement.</p><p>People only care about POJOs because EJB2 was so unbelievably bad.</p><p>Nobody gives a crap about &quot;POROs&quot; (Plain old Ruby objects) because ActiveRecord doesn't suck.</p> 
]]></content></entry><entry><title>Release It! is shipping</title><link href="https://michaelnygard.com/blog/2007/04/release-it-is-shipping/"/><id>https://michaelnygard.com/blog/2007/04/release-it-is-shipping/</id><published>2007-04-08T19:53:17-05:00</published><updated>2007-04-08T19:53:17-05:00</updated><content type="html"><![CDATA[<p><a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;link_code=as3&amp;camp=211189&amp;creative=373489&amp;creativeASIN=0978739213" target="_blank">Release It</a> is now shipping!&nbsp; People who ordered directly from <a href="https://pragprog.com" target="_blank">The Pragmatic Programmers</a> are receiving their hardcopies now.&nbsp; It will take <a href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;link_code=as3&amp;camp=211189&amp;creative=373489&amp;creativeASIN=0978739213" target="_blank">Amazon</a> and Barnes and Noble a few days or a week to work the inventory through their supply chain, but they should be shipping soon, too!</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Flash Mobs and TCP/IP Connections</title><link href="https://michaelnygard.com/blog/2007/04/flash-mobs-and-tcp/ip-connections/"/><id>https://michaelnygard.com/blog/2007/04/flash-mobs-and-tcp/ip-connections/</id><published>2007-04-08T19:35:43-05:00</published><updated>2007-04-08T19:35:43-05:00</updated><content type="html"><![CDATA[<p>In <a target="_blank" href="http://www.amazon.com/gp/product/0978739213?ie=UTF8&amp;tag=michaelnygard-20&amp;link_code=as3&amp;camp=211189&amp;creative=373489&amp;creativeASIN=0978739213">Release It</a>, I talk about users and the harm they do to our systems.&nbsp; One of the toughest types of user to deal with is the flash mob.&nbsp; A flash mob often results from <a href="/blog/2007/03/selfinflicted-wounds/">Attacks of Self-Denial</a>, like when you suddenly offer a $3000 laptop for $300 by mistake.</p><p>When a flash mob starts to arrive, you will suddenly see a surge of TCP/IP connection requests at your load-distribution layer.&nbsp; If the mob arrives slowly enough (less than 1,000 connections per second) then the app servers will be hurt the most.&nbsp; For a really fast mob, like when your site hits the top spot on <a target="_blank" href="http://www.digg.com/">digg.com</a>, you can get way more than 1,000 connections per second.&nbsp; This puts the hurt on your web servers.</p><p>As the TCP/IP connection requests arrive, the OS queues them for servicing by the application.&nbsp; As the application gets around to calling &quot;accept&quot; on the server socket, the server's TCP/IP stack sends back the SYN/ACK packet and the connection is established.&nbsp; (There's a third step, but we can skip it for the moment.)&nbsp; At that point, the server hands the established connection off to a worker thread to process the request.&nbsp; Meanwhile, the thread that accepted the connection goes back to accept the next one.</p><p>Well, when a flash mob arrives, the connection requests arrive faster than the application can accept and dispatch them. &nbsp; The TCP/IP stack protects itself by limiting the number of pending connection requests, so if the requests arrive faster than the application can accept them, the queue will grow until the stack has to start refusing connection requests.&nbsp; At that point, your server will be returning intermittent errors and you're already failing.</p><p>The solution is much easier said than done: accept <em>and dispatch</em> connections faster than they arrive.<br /></p><p>Filip Hanik compares some popular open-source servlet containers to see how well they stand up to floods of connection requests.&nbsp; In particular, he demonstrates the value of Tomcat 6's new NIO connector.&nbsp; Thanks to some very careful coding, this connector can accept 4,000 connections in 4 seconds on one server.&nbsp; Ultimately, he gets it to accept 16,000 concurrent connections on a single server.&nbsp; (Not surprisingly, RAM becomes the limiting factor.)</p><p>It's not clear that these connections can actually be serviced at that point, but that's a story for another day.<br /></p> 
]]></content></entry><entry><title>Release It! is released!</title><link href="https://michaelnygard.com/blog/2007/03/release-it-is-released/"/><id>https://michaelnygard.com/blog/2007/03/release-it-is-released/</id><published>2007-03-30T14:05:55-05:00</published><updated>2007-03-30T14:05:55-05:00</updated><content type="html"><![CDATA[<p>&quot;Release It!&quot; has been officially announced in this press release.&nbsp; Andy Hunt, my editor, also posted announcements to several mailing lists.</p><p>It's been a long road, so I'm thrilled to see this release.</p><p>When you release a new software system, that's not the end of the process, but just the beginning of the system's life.&nbsp; It is the same thing here.&nbsp; Though it's taken me two years to get this book done and on the market, this is not the end of the book's creation, but the beginning of it's life.</p><p>&nbsp;</p> 
]]></content></entry><entry><title>Self-Inflicted Wounds</title><link href="https://michaelnygard.com/blog/2007/03/self-inflicted-wounds/"/><id>https://michaelnygard.com/blog/2007/03/self-inflicted-wounds/</id><published>2007-03-25T12:29:30-05:00</published><updated>2007-03-25T12:29:30-05:00</updated><content type="html"><![CDATA[<p>My friend and colleague Paul Lord said, &quot;Good marketing can kill you at
any time.&quot;</p>

<p>He was describing a failure mode that I discuss in "Release It!: Design and
Deploy Production-Ready Software" as &quot;Attacks of Self-Denial&quot;.
These have all the characteristics of a distributed denial-of-service attack
(DDoS), except that a company asks for it. No, I'm not blaming the victim
for electronic vandalism... I mean, they <em>actually</em> ask for the
attack.</p>

<p>The anti-pattern goes something like this: marketing conceives of a brilliant
promotion, which they send to 10,000 customers. Some of those 10,000 pass
the offer along to their friends. Some of them post it to sites like <a
target="_blank" href="https://www.fatwallet.com/">FatWallet</a> or <a
target="_blank" href="https://www.techbargains.com/">TechBargains</a>. On the
appointed day, hour, and minute, the site has a date with destiny as a million or
more potential customers hit the deep link that marketing sent around in the
email. You know, the one that bypasses the content distribution network,
embeds a session ID in the URL, and uses SSL?</p>

<p>Nearly every retailer I know
has done this to themselves at one point. Two holidays ago, one of my
clients did it to themselves, when they announced that XBox 360 preorders would
begin at a certain day and time. Between actual customers and the amateur
shop-bots that the tech-savvy segment cobbled together, the site got
crushed. (Yes, this was one where marketing sent the deep link that
bypassed all the caching and bot-traps.)</p>

<p>Last holiday, <a target="_blank" href="https://www.amazon.com/">Amazon</a> did
it to themselves when they discounted the
XBox 360 by $300. (What is it about the XBox 360?) They offered a
thousand units at the discounted price and got ten million shoppers. All of
Amazon was inaccessible for at least 20 minutes. (It may not sound like
much, but some estimates say Amazon generates $1,000,000 per hour during the
holiday season, so that 20 minute outage probably cost them around $200,000!)</p>

<p>In Release It!, I discuss some non-technical ways to mitigate this behavior,
as well as some design and architecture patterns you can apply to minimize damage
when one of these Attacks of Self-Denial occur.</p>
]]></content></entry><entry><title>Design Patterns in Real Life</title><link href="https://michaelnygard.com/blog/2007/03/design-patterns-in-real-life/"/><id>https://michaelnygard.com/blog/2007/03/design-patterns-in-real-life/</id><published>2007-03-16T16:53:52-05:00</published><updated>2007-03-16T16:53:52-05:00</updated><content type="html"><![CDATA[<p>I've seen walking cliches before.&nbsp; There was this one time in the <a href="http://minneapolis.about.com/cs/shoppingservice/a/skyways.htm" target="_blank">Skyway</a> that I actually saw a guy with a white cane being led by a woman with huge dark sunglasses and a guide dog.&nbsp; Today, though, I realized I was watching a design pattern played out with people instead of objects.<br /><br />I've used the <a target="_blank" href="http://c2.com/cgi/wiki?ReactorPattern">Reactor pattern</a> in my software before.&nbsp; It's particularly helpful when you combine it with non-blocking multiplexed I/O, such as Java's <a target="_blank" href="http://java.sun.com/javase/6/docs/api/java/nio/package-summary.html">NIO package</a>.<br /><br />Consider a server application such as a web server or mail transfer agent.&nbsp; A client connects to a socket on the server to send a request. The server and client talk back and forth a little bit, then the server either processes or denies the client's request.<br /><br />If the server just used one thread, then it could only handle a single client at a time.&nbsp; That's not likely to make a winning product. Instead, the server uses multiple threads to handle many client connections.<br /><br />The obvious approach is to have one thread handle each connection.&nbsp; In other words, the server keeps a pool of threads that are ready and waiting for a request.&nbsp; Each time through its main loop, the server gets a thread from the pool and, on that thread, calls the socket &quot;accept&quot; method.&nbsp; If there's already a client connection request waiting, then &quot;accept&quot; returns right away.&nbsp; If not, the thread blocks until a client connects.&nbsp; Either way, once &quot;accept&quot; returns, the server's thread has an open connection to a client.<br /><br />At that point, the thread goes on to read from the socket (which blocks again) and, depending on the protocol, may write a response or exchange more protocol handshaking.&nbsp; Eventually, the demands of protocol satisfied, the client and server say goodbye and each end closes the socket.&nbsp; The worker thread pulls a Gordon Freeman and disappears into the pool until it gets called up for duty again.<br /><br />It's a simple, obvious model.&nbsp; It's also really inefficient.&nbsp; Any given thread spends most of its life doing nothing.&nbsp; It's either blocked in the pool, waiting for work, or it's blocked on a socket &quot;accept&quot;, &quot;read&quot;, or &quot;write&quot; call.<br /><br />If you think about it, you'll also see that the naive server can handle only as many connections as it has threads.&nbsp; To handle more connections, it must fork more threads.&nbsp; Forking threads is expensive in two ways.&nbsp; First, starting the thread itself is slow.&nbsp; Second, each thread requires a certain amount of scheduling overhead.&nbsp; Modern JVMs scale well to large numbers of threads, but sooner or later, you'll still hit the ceiling.<br /><br />I won't go into all the details of non-blocking I/O here.&nbsp; (I can point you to a <a href="http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html">decent article</a> on the subject, though.)&nbsp; Its greatest benefit is you do not need to dedicate a thread to each connection.&nbsp; Instead, a much smaller pool of threads can be allocated, as needed, to handle individual steps of the protocol.&nbsp; In other words, thread 13 doesn't necessarily handle the whole conversation. Instead, thread 4 might accept the connection, thread 29 reads the initial request, thread 17 starts writing the response and thread 99 finishes sending the response.<br /><br />This model employs threads much more efficiently.&nbsp; It also scales to many more concurrent requests.&nbsp; Bookkeeping becomes a hassle, though. Keeping track of the state of the protocol when each thread only does a little bit with the conversation becomes a challenge.&nbsp; Finally, the (hideously broken) multithreading restrictions in Java's &quot;selector&quot; API make fully multiplexed threads impossible.<br /><br />The Reactor pattern predates Java's NIO, but works very well here.&nbsp; It uses a single thread, called the Acceptor, to await incoming &quot;events&quot;. This one thread sleeps until <em>any</em> of the connections needs service: either due to an incoming connection request, a socket ready to read, or a socket ready for write.&nbsp; As soon as one of these events occurs, the Acceptor hands the event off to a dispatcher (worker) thread that then processes the event.<br /><br />You can visualize this by sitting in a <a href="http://tgifridays.com/main_flash.html" target="_blank">TGI Friday's</a> or <a href="http://www.chilis.com/" target="_blank">Chili's</a> restaurant.&nbsp; (I'm fond of the crowded little ones inside airports. You know, the ones with a third of the regular menu and a line stretching out the door.&nbsp; Like a home away from home for me lately.) The &quot;greeter&quot; accepts incoming connections (people) and hands them off to a &quot;worker&quot; (server).&nbsp; The greeter is then ready for the next incoming request.&nbsp; (The line out the door is the listen queue, in case you're keeping score.)&nbsp; When the kitchen delivers the food, it doesn't wait for the original worker thread.&nbsp; Instead, a different worker thread (a runner) brings the food out to the table.<br /></p><p>I'll keep my eyes open for other examples of object-oriented design patterns in real life--though I don't expect to see many based on polymorphism.<br /></p> 
]]></content></entry><entry><title>Another Path to a Killer Product</title><link href="https://michaelnygard.com/blog/2007/03/another-path-to-a-killer-product/"/><id>https://michaelnygard.com/blog/2007/03/another-path-to-a-killer-product/</id><published>2007-03-06T10:40:28-06:00</published><updated>2007-03-06T10:40:28-06:00</updated><content type="html"><![CDATA[<p>Give individuals powers once reserved for masses<br /><br />Here's a common trajectory:<br /><br />1. Something is so expensive that groups (or even an entire government) have to share them.&nbsp; Think about mainframe computers in the Sixties.<br /><br />2. The price comes down until a committed individual can own one.&nbsp; Think homebrew computers in the Seventies.&nbsp; The &quot;average&quot; person&nbsp; wouldn't own one, but the dedicated geek-hobbyist would.<br /><br />3. The price comes down until the average individual can own one.&nbsp; Think PCs in the Eighties.<br /><br />4. The price comes down until the average person owns dozens.&nbsp; PCs, game consoles, MP3 players, GPS navigators, laptops, embedded processors in toasters and cars.&nbsp; An average person may have half a dozen devices that once were considered computers.<br /><br />Along the way, the product first gains broader and broader functionality, then becomes more specific and dedicated.<br /><br />Telephones, radios and televisions all followed the same trajectory.&nbsp; You would probably call these moderately successful products.<br /><br />So: find something so expensive that groups have to purchase and share it.&nbsp; Make it cheap enough for a private individual.<br /><br /></p> 
]]></content></entry><entry><title>Quantum Manipulations</title><link href="https://michaelnygard.com/blog/2007/02/quantum-manipulations/"/><id>https://michaelnygard.com/blog/2007/02/quantum-manipulations/</id><published>2007-02-10T14:15:03-06:00</published><updated>2007-02-10T14:15:03-06:00</updated><content type="html"><![CDATA[<p>I work in information technology, but my first love is science.&nbsp; Particularly the hard sciences of physics and cosmology.</p><p>There've been a series of experiments over the last few years that have demonstrated quantum manipulations of light and matter that approach the macroscopic realm.</p><p>A recent result from Harvard (HT to Dion Stewart for the link) has gotten a lot of (incorrect) play.&nbsp; It involves absorbing photons with a Bose-Einstein condensate, then reproducing identical photons at some distance in time and space.&nbsp; I've been reading about these experiments with a lot of interest, along with the experiments going the &quot;other&quot; direction: supraluminal group phase travel.<br /><br />I wish the science writers would find a new metaphor, though.&nbsp; They all talk in terms of &quot;stopping light&quot; or &quot;speeding up light&quot;.&nbsp; None of these have to do with changing the speed of light, either up or down.&nbsp; This is about photons, not the speed of light.<br /><br />In fact, this latest one is even more interesting when you view it in terms of the &quot;computational universe&quot; theory of <a target="_blank" href="http://www.randomhouse.com/kvpa/lloyd/">Seth Lloyd</a>.&nbsp; What they've done is captured the <em>complete</em> quantum state of the photons, somehow 'imprinted' on the atoms in the condensate, then recreated the photons from that quantum state.<br /><br />This isn't mere matter-energy conversion as the headlines have said.&nbsp; It's something much more.<br /><br />The Bose-Einstein condensate can be described as a phase of matter colder than a solid.&nbsp; It's much weirder than that, though.&nbsp; In the condensate, all the particles in all the atoms achieve a single wavefunction.&nbsp; You can describe the entire collection of protons, neutrons and electrons as if it were one big particle with its own wavefunction.<br /><br />This experiment with the photons shows that the photons' wavefunctions can be superposed with the wavefunction of the condesnate, then later extracted to separate the photons from the condensate.<br /><br />The articles somewhat misrepresent this as being about converting light (energy) to matter, but its really about converting the photon particles to pure information then using that information to recreate identical particles elsewhere.&nbsp; Yikes! <br /></p> 
]]></content></entry><entry><title>A path to a product</title><link href="https://michaelnygard.com/blog/2007/02/a-path-to-a-product/"/><id>https://michaelnygard.com/blog/2007/02/a-path-to-a-product/</id><published>2007-02-04T10:07:11-06:00</published><updated>2007-02-04T10:07:11-06:00</updated><content type="html"><![CDATA[<p>Here's a &quot;can't lose&quot; way to identify a new product: Enable people to plan ahead less.&nbsp; <br /><br />Take cell phones.&nbsp; In the old days, you had to know where you were going before you left.&nbsp; You had to make reservations from home.&nbsp; You had to arrange a time and place to meet your kids at Disney World.<br /><br />Now, you can call &quot;information&quot; to get the number of a restaurant, so you don't have to decide where you're going until the last possible minute.&nbsp; You can call the restaurant for reservations from your car while you're already on your way.<br /><br />With cell phones, your family can split up at a theme park without pre-arranging a meeting place or time.<br /><br />Cell phones let you improvise with success.&nbsp; Huge hit.<br /><br />GPS navigation in cars is another great example.&nbsp; No more calling AAA weeks before your trip to get &quot;TripTix&quot; maps.&nbsp; No more planning your route on a road atlas.&nbsp; Just get in your car, pick a destination and start driving.&nbsp; You don't even have to know where to get gas or food<br />along the way.<br /><br />Credit and debit cards let you go places without planning ahead and carrying enough cash, gold, or jewels to pay your way.<br /><br />The Web is the ultimate &quot;preparation avoidance&quot; tool.&nbsp; No matter what you're doing, if you have an always-on 'Net connection, you can improvise your way through meetings, debates, social engagements, and work situations.<br /><br />Find another product that lets procrastinators succeed, and you've got a sure winner.&nbsp; There's nothing that people love more than the personal liberation of not planning ahead.<br /></p><p>&nbsp;</p> 
]]></content></entry><entry><title>How to become an "architect"</title><link href="https://michaelnygard.com/blog/2007/01/how-to-become-an-architect/"/><id>https://michaelnygard.com/blog/2007/01/how-to-become-an-architect/</id><published>2007-01-28T01:04:40-06:00</published><updated>2007-01-28T01:04:40-06:00</updated><content type="html"><![CDATA[<p>Over at <a href="http://www.theserverside.com/">The Server Side</a>, there's a <a href="http://www.theserverside.com/news/thread.tss?thread_id=44011">discussion</a> about how to become an &quot;architect&quot;.&nbsp; Though TSS comments often turn into a cesspool, I couldn't resist adding my <a href="http://www.theserverside.com/news/thread.tss?thread_id=44011#226312">own two cents</a>.</p><p>I should also add that the title &quot;architect&quot; is vastly overused.&nbsp; It's tossed around like a job grade on the technical ladder: associate developer, developer, senior developer, architect.&nbsp; If you talk to a consulting firm, it goes more like: senior consultant (1 - 2 years experience), architect (3 - 5 years experience), senior technical architect (5+ years experience).&nbsp; Then again, I may just be too cynical.</p><p>There are several qualities that the architecture of a system should be:</p><ol><li>Shared.&nbsp; All developers on the team should have more or less the same vision of the structure and shape of the overall system.</li><li>Incremental.&nbsp; Grand architecture projects lead only to grand failures.</li><li>Adaptable. Successful architectures can be used for purposes beyond their designers' original intentions.&nbsp; (Examples: Unix pipes, HTTP, Smalltalk)</li><li>Visible.&nbsp; The &quot;sacred, invisible architecture&quot; will fall into disuse and disrepair.&nbsp; It will not outlive its creator's tenure or interest.</li></ol><p>Is the designated &quot;architect&quot; the only one who can produce these qualities?&nbsp; Certainly not.&nbsp; He/she should be the steward of the system, however, leading the team toward these qualities, along with the other -ilities, of course.</p><p>Finally, I think the most important qualification of an architect should be: someone who has created more than one system and lived with it in production.&nbsp; Note that automatically implies that the architect must have at least <em>delivered</em> systems into production.&nbsp; I've run into &quot;architects&quot; who've never had a project actually make it into production, or if they have, they've rolled off the project---again with the consultants---just as Release 1.0 went out the door.</p><p>In other words, architects should have scars.&nbsp;</p> 
]]></content></entry><entry><title>Planning to Support Operations</title><link href="https://michaelnygard.com/blog/2007/01/planning-to-support-operations/"/><id>https://michaelnygard.com/blog/2007/01/planning-to-support-operations/</id><published>2007-01-14T10:44:14-06:00</published><updated>2007-01-14T10:44:14-06:00</updated><content type="html"><![CDATA[      <p>
In 2005, I was on a team doing application development for a system that would
be deployed to 600 locations.  About half of those locations would not
have network connections.  We knew right away that deploying our
application would be key, particularly since it is a &quot;rich-client&quot; application.  (What we used to call a &quot;fat client&quot;, before they became cool again.)  Deployment had to be done by store
associates, not IT.  It had to be safe, so that a failed deployment
could be rolled back before the store opened for business the next
day.  We spent nearly half of an iteration setting up the installation
scripts and configuration.  We set our continuous build server up to
create the &quot;setup.exe&quot; files on every build.  We did hundreds of test
installations in our test environment.
      </p>
	
      <p>
Operations said that our software was &quot;the easiest installation we've
ever had.&quot; Still, that wasn't the end of it.  After the first update
went out, we asked operations what could be done to improve the
upgrade process.  Over the next three releases, we made numerous
improvements to the installers:
      </p>
	
      <ul>
	<li>Make one &quot;setup.exe&quot; that can install either a server or a client, and have the installer itself figure out which one to do.</li>
	<li>Abort the install if the application is still running. This turned out to be particularly important on the server.</li>
	<li>Don't allow the user to launch the application twice.  Very hard to implement in Java.  We were fortunate to find an <a href="http://www.ej-technologies.com/products/install4j/overview.html">installer package</a> that made this a check-box feature in the build configuration file!</li>
	<li>Don't show a blank Windows command prompt window.  (An artifact of our original .cmd scripts that were launching the application.)</li>
	<li>Create separate installation discs for the two different store brands.</li>
	<li>When spawning a secondary application, force it's window to the front, avoiding the appearance of a hang if the user accidentally gives focus to the original window.</li>
      </ul>

      <p>
These changes reduced support call volume by nearly 50%.
      </p>
	
      <p>
My point is not to brag about what a great job we did.  (Though we did a great job.)  To keep improving our support for operations, we deliberately
set aside a portion of our team capacity each iteration. Operations
had an open invitation to our iteration planning meetings, where they
could prioritize and select story cards the same as our other
stakeholders.  In this manner, we explicitly included Operations as a stakeholder in application construction.  They consistently brought us ideas and requests that we, as developers, would not have come up with.
      </p>

<p>
Furthermore, we forged a strong bond with Operations.  When issues arose---as they always will---we avoided all of the usual finger-pointing.  We reacted as one team, instead of two disparate teams trying to avoid responsibility for the problems.  I attribute that partly to the high level of professionalism in both development and operations, and partly to the strong relationship we created through the entire development cycle.
</p>
 
]]></content></entry><entry><title>&amp;quot;Us&amp;quot; and &amp;quot;Them&amp;quot;</title><link href="https://michaelnygard.com/blog/2007/01/quotusquot-and-quotthemquot/"/><id>https://michaelnygard.com/blog/2007/01/quotusquot-and-quotthemquot/</id><published>2007-01-02T22:34:43-06:00</published><updated>2007-01-02T22:34:43-06:00</updated><content type="html"><![CDATA[  <p>
As a consultant, I've joined a lot of projects, usually not right when
the team is forming.  Over the years, I've developed a few heuristics
that tell me a lot about the psychological health of the
team.  Who lunches together?  When
someone says &quot;whole team meeting,&quot; who is invited?  Listen for the &quot;us
and them&quot; language.  How inclusive is the &quot;us&quot; and who is relegated to
&quot;them?&quot; These simple observations speak volumes about the perception
of the development team.  You can see who they consider their
stakeholders, their allies, and their opponents.
    </p>
    
  <p>
Ten years ago, for example, the users were always &quot;them.&quot;  Testing and
QA was <i>always</i> &quot;them.&quot;  Today, particularly on agile
teams, testers and users often get &quot;us&quot; status (As an
aside, this may be why startups show such great productivity in the
early days. The company isn't big enough to allow &quot;us&quot; and &quot;them&quot;
thinking to set in.  Of course, the converse is true as well: us and them thinking in a startup might be a failure indicator
to watch out for!).  Watch out if an &quot;us&quot; suddenly
becomes &quot;them.&quot; Trouble is brewing!
  </p>
    
  <p>
Any conversation can create a &quot;happy accident;&quot; some understanding
that obviates a requirement, avoids a potential bug, reduces cost, or
improves the outcome in some other way.  Conversations prevented
thanks to an armed-camp mentality are opportunities lost.
  </p>
        
  <p>
One of the most persistent and perplexing &quot;us&quot; and &quot;them&quot; divisions I
see is between development and operations.  Maybe it's due to the high
org-chart distance (OCD) between development groups and operations
groups.  Maybe it's because development doesn't tend to plan as far
ahead as operations does.  Maybe it's just due to a long-term dynamic
of requests and refusals that sets each new conversation up for
conflict. Whatever the cause, two groups that should absolutely be
working as partners often end up in conflict, or worse, barely
speaking at all.
  </p>

  <p>
This has serious consequences.  People in the &quot;us&quot; tent get their
requests built very quickly and accurately.  People in the &quot;them&quot; tent
get told to write specifications.  Specifications have their place.
Specifications are great for the fourth or fifth iteration of a well-defined
process.  During development, though, ideas need to be explored, not
specified.  If a developer has a vague idea about using the storage
area network to rapidly move large volumes of data from the content
management system into production, but he doesn't know how to write
the request, the idea will wither on the vine.
  </p>

  <p>
The development-operations divide virtually ensures that applications
will not be transitioned to operations as effectively as possible.
Some vital bits of knowledge just don't fit into a document template.
For example, developers have knowledge about the internals of the
application that can help diagnose and recover from system failures.
(Developer: &quot;Oh, when you see all the request handling threads
blocked inside the K2 client library, just bounce the search servers.
The app will come right back.&quot;  Operations: &quot;Roger that.  What's a
thread?&quot;) These gaps in knowledge degrade uptime, either by extending
outages or preventing operations from intervening.  If the company
culture is at all political, one or two incidents of downtime will be
enough to start the finger-pointing between development and
operations.  Once that corrosive dynamic gets started, nothing short
of changing the personnel or the leadership will stop it.
</p> 
]]></content></entry><entry><title>Inviting Domestic Disaster</title><link href="https://michaelnygard.com/blog/2006/12/inviting-domestic-disaster/"/><id>https://michaelnygard.com/blog/2006/12/inviting-domestic-disaster/</id><published>2006-12-26T18:22:03-06:00</published><updated>2006-12-26T18:22:03-06:00</updated><content type="html"><![CDATA[
<p>We had a minor domestic disaster this morning.  It's not unusual.  With four children, there's always some kind of crisis.  Today, I followed a trail of water along the floor to my youngest daughter.  She was shaking her &quot;sippy cup&quot; upside down, depositing a full cup of water on the carpet... and on my new digital grand piano.</p>

<p>Since the entire purpose of the &quot;sippy cup&quot; is to contain the water,
not to spread it around this house, this was perplexing.</p>

<p>On investigation, I found that this failure in function actually mimicked common dynamics of major disasters. In &quot;Inviting Disaster&quot;, James R. Chiles describes numerous mechanical and industrial disasters, each with a terrible cost in lives.  In <a href="https://pragprog.com/titles/mnee2/">Release It</a>, I discuss software failures that cost millions of dollars---though, thankfully, no lives.  None of these failures come as a bolt from the blue.  Rather, each one has precursor incidents: small issues whose significance are only obvious in retrospect.  Most of these chains of events also involve humans and human interaction with the technological environment.</p>

<p>The proximate cause of this morning's problem was inside the sippy cup itself. The removable valve was inserted into the lid backwards, completely negating its purpose.  A few weeks earlier, I had pulled a sippy cup from the cupboard with a similarly backward valve.  I knew it had been assembled by my oldest, who has the job of emptying the dishwasher, so I made a mental note to provide some additional instruction.  Of course, mental notes are only worth the paper they're written on.  I never did get around to speaking with her about it.</p>

<p>Today, my wonderful mother-in-law, who is visiting for the holidays, filled the cup and gave it to my youngest child.  My mother-in-law, not having dealt with thousands of sippy cup fillings, as I have, did not notice the reversed valve, or did not catch its significance.</p>

<p>My small-scale mess was much easier to clean up than the disasters in &quot;Release It!&quot; or &quot;Inviting Disaster&quot;.  It shared some similar features, though.  The individual with experience and knowledge to avert the problem--me--was not present at the crucial moment.  The preconditions were created by someone who did not recognize the potential significance of her actions.  The last person who could have stopped the chain of events did not have the experience to catch and stop the problem.  Change any one of those factors and the crisis would not have occurred.</p>
]]></content></entry><entry><title>Book Completed</title><link href="https://michaelnygard.com/blog/2006/12/book-completed/"/><id>https://michaelnygard.com/blog/2006/12/book-completed/</id><published>2006-12-13T20:44:00-06:00</published><updated>2006-12-13T20:44:00-06:00</updated><content type="html"><![CDATA[<p> I'm thrilled to report that my book is now out of my hands and into the hands of copy editors and layout artists. </p><p> It's been a long trip.  At the beginning, I had no idea just how much work was needed to write an entire book.  I started this project 18 months ago, with a sample chapter, a table of contents, and a proposal.  That was a few hundred pages, three titles, and a thousand hours ago. </p><p>  Now &quot;<a href="https://pragprog.com/titles/mnee2/">Release It! Design and Deploy Production-Ready Software</a>&quot; is close to print.  Even in these days of the permanent ephemerance of electronic speech, there's still something incomparably electric about seeing your name in print. </p><p>  Along with publication of the book, I will be making some changes to this blog.  First, it's time to find a real home.  That means a new host, but it should be transparent to everyone but me. Second, I will be adding non-blog content: excerpts from the book, articles, and related content.  (I have some thoughts about capacity management that need a home.)  Third, if there is interest, I will start a discussion group or mailing list for conversation about survivable software.</p> <p>&nbsp;</p> 
]]></content></entry><entry><title>Reflexivity and Introspection</title><link href="https://michaelnygard.com/blog/2006/10/reflexivity-and-introspection/"/><id>https://michaelnygard.com/blog/2006/10/reflexivity-and-introspection/</id><published>2006-10-07T15:22:00-05:00</published><updated>2006-10-07T15:22:00-05:00</updated><content type="html"><![CDATA[<p> A fascinating niche of programming languages consists of those languages which are constructed in themselves.  For instance, <a href="http://www.squeak.org/">Squeak</a> is a Smalltalk whose interpreter is written in Squeak.  Likewise, the best language for writing a LISP interpreter turns out to be LISP itself.  (That one is more like nesting than bootstrapping, but it's closely related.) </p><p> I think Ruby has enough introspection to be built the same way.  Recently, a friend clued me in to PyPy, a Python interpreter written in Python. </p><p> I'm sure there are many others.  In fact the venerable GCC is written in its own flavor of C.  Compiling GCC from scratch requires a bootstrapping phase, by compiling a small version of GCC, written in a more portable form of C, with some other C compiler.  Then, the phase I micro-GCC compiles the whole GCC for the target platform. </p><p> Reflexivity arises when the language has sufficient introspective capabilities to describe itself.  I cannot help but be reminded of <a href="http://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach">Godel, Escher, Bach</a> and the difficulties that reflexivity cause.  Godel's Theorem doesn't really kick in until a formal system is complex enough to describe itself.  At that point, Godel's Theorem proves that there will be true statements, expressed in the language of the formal system, that cannot be proven true.  These are inevitably statements about themselves---the symbolic logic form of, &quot;This sentence is false.&quot; </p><p> Long-time LISP programmers create works with such economy of expression that we can only use artistic metaphors to describe them.  Minimalist.  Elegant.  Spare.  Rococo. </p><p> <a href="http://en.wikipedia.org/wiki/Forth">Forth</a> was my first introduction to self-creating languages.  FORTH starts with a tiny kernel (small enough that it fit into a 3KB cartridge for my VIC-20) that gets extended one &quot;word&quot; at a time.  Each word adds to the vocabulary, essentially customizing the language to solve a particular problem.  It's really true that in FORTH, you don't write programs to solve problems.  Instead, you invent a language in which solving the problem is trivial, then you spend your time implementing that language. </p><p> Another common aspect of these self-describing languages seems to be that they never become widely popular.  I've heard several theories that attempted to explain this.  One says that individual LISP programmers are so productive that they never need large teams.  Hence, cross-pollination is limited and it is hard to demonstrate enough commercial demand to seem convincing.  Put another way: if your started with equal populations of Java and LISP programmers, demand for Java programmers would quickly outstrip demand for LISP programmers... not because it's a superior language, but just because you <em>need</em> more Java programmers for any given task.  This demand becomes self-reinforcing, as commercial programmers go where the demand is, and companies demand what they see is available. </p><p> I also think there's a particular mindset that admires and relates to the dynamic of the self-creating language.  I suspect that programmers possessing that mindset are also the ones who get excited by metaprogramming. </p>   
]]></content></entry><entry><title>Education as mental immune system</title><link href="https://michaelnygard.com/blog/2006/09/education-as-mental-immune-system/"/><id>https://michaelnygard.com/blog/2006/09/education-as-mental-immune-system/</id><published>2006-09-25T17:22:00-05:00</published><updated>2006-09-25T17:22:00-05:00</updated><content type="html"><![CDATA[<p> Education and intelligence act like a memetic immune system.  For instance, anyone with knowledge of chemistry understands that &quot;binary liquid explosives&quot; are a movie plot, not a security threat.  On the other hand, lacking education, TSA officials told a woman in front of me to throw away her Dairy Queen ice cream cones before she could board the plane.  Ice cream. </p><p> How in the hell is anyone supposed to blow up a plane with ice cream?  It defies imagination. </p><p> She was firmly and seriously told, &quot;Once it melts, it will be a liquid and all liquids and gels are banned from the aircraft.&quot; </p><p> I wanted to ask him what the TSA's official position was on collodal solids.  They aren't gels or liquids, but amorphous liquids trapped in a suspension of solid crystals.  Like a creamy mixture of dairy fats, egg yolks, and flavoring trapped in a suspension of water ice crystals. </p><p> I didn't of course.  I've heard the chilling warnings, &quot;Jokes or inappropriate remarks to security officials will result in your detention and arrest.&quot;  (Real announcement.  I heard it in Houston.)  In other words, mouth off about the idiocy of the system and you'll be grooving to Brittney Spears in Gitmo. </p><p> On the other hand, there are other ideas that <span style="font-weight: bold">only</span> make sense if you're overly educated.  Dennis Prager is fond of saying that you have to go to graduate school to believe things like, &quot;The Republican party is more dangerous than Hizbollah.&quot; </p><p> Of course, I don't think he's really talking about post-docs in Chemical Engineering. </p>  
]]></content></entry><entry><title>Expressiveness, revisited</title><link href="https://michaelnygard.com/blog/2006/05/expressiveness-revisited/"/><id>https://michaelnygard.com/blog/2006/05/expressiveness-revisited/</id><published>2006-05-05T21:28:00-05:00</published><updated>2006-05-05T21:28:00-05:00</updated><content type="html"><![CDATA[<p>
 I <a href="/blog/2005/12/ruby-expressiveness-and-repeat/">previously</a> mused about the expressiveness of Ruby compared to Java. Dion Stewart pointed me toward F-Script, an interpreted, Smalltalk-like scripting language for Mac OS X and Cocoa.  In F-Script, invoking a method on every object in an array is built-in syntax.  Assuming that updates is an array containing objects that understand the preProcess and postProcess messages. </p>

<pre>
updates preProcess
updates postProcess
</pre>


<p> That's it.  Iterating over the elements of the collection is automatic. </p>

<p> F-Script admits much more sophisticated array processing; multilevel iteration, row-major processing, column-major processing, inner products, outer products, &quot;compression&quot; and &quot;reduction&quot; operations.  The most amazing thing is how natural the idioms look, thanks to their clean syntax and the dynamic nature of the language. </p>

<p> It reminds me of a remark about General Relativity, that economy of expression allowed vast truths to be stated in one simple, compact equation.  It would, however, require fourteen years of study to understand the notation used to write the equation, and that one could spend a lifetime understanding the implications. </p>
]]></content></entry><entry><title>Inviting Disaster</title><link href="https://michaelnygard.com/blog/2006/01/inviting-disaster/"/><id>https://michaelnygard.com/blog/2006/01/inviting-disaster/</id><published>2006-01-11T19:36:00-06:00</published><updated>2006-01-11T19:36:00-06:00</updated><content type="html"><![CDATA[<p> I'm reading a fabulous book called &quot;Inviting Disaster&quot;, by James
R. Chiles.  He discusses hundreds of engineering and mechanical disasters.  Most
of them caused serious loss of life. </p>

<p> There are several common themes: </p>

<p> 1. Enormously complex systems that react in sometimes unpredictable ways </p>

<p> 2. Inadequate testing, training, or preparedness for failures -- particularly
for multiple concurrent failures </p>

<p> 3. A chain of events leading to the &quot;system fracture&quot;.  Usually
exacerbated by human error </p>

<p> 4. Politics or budget pressure causing otherwise responsible people to rush
things out.  This often involves whitewashing or pooh-poohing legitimate
criticism and concern from experts involved. </p>

<p> The parallels to some projects I've worked on are kind of eerie.
Particularly when he's talking about things like the DC-10 and the Hubble Space
Telescope.  In both of those cases, warning signs were visible during the
construction and early testing, but because each of the people involved had
tunnel vision limited to that person's silo, the clues got missed. </p>

<p> The scary part is that there is no solution here.  Sometimes, you can't even
place the blame very squarely.  When half-a-dozen people were involved with
unloading and handling of oxygen-generating cylinders on a ValuJet flight, no
single individual really did something wrong (or contrary to procedure, anyway).
Still, the net effect of their actions cost the lives of every single person on
that flight. </p>

<p> It's grim stuff, but it ought to be required reading.  If you ever leave your
house again, you'll be much better prepared for building and operating complex
systems. </p>
]]></content></entry><entry><title>New Interview Question</title><link href="https://michaelnygard.com/blog/2005/12/new-interview-question/"/><id>https://michaelnygard.com/blog/2005/12/new-interview-question/</id><published>2005-12-26T22:53:00-06:00</published><updated>2005-12-26T22:53:00-06:00</updated><content type="html"><![CDATA[<p> So many frameworks... so much alphabet soup on the resumes. </p><p> Anyone that ever reads <a href="http://www.theserverside.com/">The Server Side</a> or <a href="http://www.monster.com">Monster.com</a> knows exactly which boxes to hit when they're writing a resume.  The recruiters telegraph their needs a mile away.  (Usually because they couldn't care less about the differences or similarities between Struts, JSF, WebWork, etc.)  As long as the candidate knows how to spell Spring and Hibernate, they'll get submitted to the &quot;preferred vendor&quot; system. </p><p> Being one of those candidates is tough, but that's not the part I'm concerned about now.  I'm interested in weeding out the know-nothings, the poseurs, and the fast talkers. </p><p> When I'm interviewing somebody, my main criterion is this: would I want to work on a two-person project with this candidate?  My secondary criterion is &quot;Would I feel comfortable leaving this person along at a client site?  Will they deliver value to the client?  Will they look like an idiot, and by extension, make me look like an idiot?&quot; </p><p> My friend Dion Stewart had a great idea for a weed-out question.  No matter what frameworks the candidate shows on the resume, ask them what they disliked the most about the framework.  (I have my top three list for each framework I've worked in... except NeXT's Enterprise Objects Framework.  But that's another story.) </p><p> If they can't answer at all, then they haven't actually worked with the framework.  They're just playing buzzword bingo. </p><p> If they answer, but it sounds like bullshit, then odds are they're bullshitting you. </p><p> If they have never thought about it, haven't formed an opinion, or say &quot;it's all good&quot;, then they lack passion about what they do. </p><p> A candidate that is driven, that cares about the quality-without-a-name should be able to go on a rant about something in each framework they've actually worked with.  In fact, you've really hit the jackpot if your candidate &lt;i&gt;can&lt;/i&gt; go on a rant, but does it in a professional, reasoned way.  I love to see a candidate that can show some fire without seeming like a loon.  That's when I can see how they'll react when the client makes a decision the candidate considers boneheaded.  (I've seen some spectacular pyrotechnics from consultants that forgot whose money they're spending.  But that's another story.) </p>]]></content></entry><entry><title>JAI 1.1.3 in beta</title><link href="https://michaelnygard.com/blog/2005/12/jai-1.1.3-in-beta/"/><id>https://michaelnygard.com/blog/2005/12/jai-1.1.3-in-beta/</id><published>2005-12-22T11:46:00-06:00</published><updated>2005-12-22T11:46:00-06:00</updated><content type="html"><![CDATA[<p> I've been using JAI 1.1.2 for the past year.  It's an incredibly powerful tool, though I will confess that the API is more than a bit quirky. </p><p> Early this year, Sun made JAI an open-source project available at java.net.  That project has been working on the 1.1.3 release for most of the year.  It's now in beta, with a few enhancements and a lot of bug fixes. </p><p> The most significant enhancement is that JAI can now be used with Java WebStart.  Previously it had to be installed as a JRE extension. </p><p> Also, one of the big bugs is fixed.  Issue #13 is fixed in the beta.  It could cause the JPEG codec to use excessive amounts of memory when decoding large untiled images.  (Which we do in our app a lot!) </p>]]></content></entry><entry><title>Ruby expressiveness and repeating yourself</title><link href="https://michaelnygard.com/blog/2005/12/ruby-expressiveness-and-repeating-yourself/"/><id>https://michaelnygard.com/blog/2005/12/ruby-expressiveness-and-repeating-yourself/</id><published>2005-12-10T19:57:00-06:00</published><updated>2005-12-10T19:57:00-06:00</updated><content type="html"><![CDATA[<p> Just this week, I was reminded again of how Java forces you to repeat yourself.  I had an object that contains a sequence of &quot;things to be processed&quot;.  The sequence has to be traversed twice, once before an extended process runs and once afterwards. </p>

<p> The usual Java idiom looks like this: </p>  

<pre>
public void preProcess(ActionContext context) {
  for (Iterator iter = updates.iterator(); iter.hasNext(); ) {
    TwoPhaseUpdate update = (TwoPhaseUpdate) iter.next();
    update.preProcess(context);
  }
}

public void postProcess(ActionContext context) {
  for (Iterator iter = updates.iterator(); iter.hasNext(); ) {
    TwoPhaseUpdate update = (TwoPhaseUpdate) iter.next();
    update.preProcess(context);
  }
}

</pre> 

<p> Notice that there are only two symbols different between these two methods, out of 20 semantically significant symbols.  According to the <a href="https://pragprog.com">Pragmatic Programmers</a>, even iterating over the collection counts as a kind of repetition (and therefore a violation of DRY - don't repeat yourself.) </p>

<p> The Ruby equivalent would be something like: </p>  

<pre>
def preProcess(context)
   updates.each { |u| u.preProcess(context) }
end

def postProcess(context)
   updates.each { |u| u.postProcess(context) }
end
</pre>

<p> Now, there are two differening symbols out of 10 (20% variance instead of 10%).  There's been no loss of expressiveness, in fact, the main intention of the code is clearer in the Ruby version than in the Java version. </p>

<p> Can we make the variance higher?  Perhaps. </p>

<pre>
def preProcess(context)
   each_update(:preProcess, context)
end

def postProcess(context)
   each_update(:postProcess, context)
end

def each_update(method, context)
   updates.each { |u| u.send(method, context) }
end 
</pre>

<p> Now the two primary methods have 2 symbols out of 7 different or nearly 28%.  The expressiveness is damaged a little bit by the dynamic dispatch via &quot;send&quot;.  It would be unthinkable to use reflection in Java to make the code <em>clearer</em>.  (Anyone who's worked with reflection knows what I mean.) Here, it's not unthinkable, but it might just not help clarity.</p>

]]></content></entry><entry><title>MySQL 5.0 Stored Procedures</title><link href="https://michaelnygard.com/blog/2005/10/mysql-5.0-stored-procedures/"/><id>https://michaelnygard.com/blog/2005/10/mysql-5.0-stored-procedures/</id><published>2005-10-15T16:05:00-05:00</published><updated>2005-10-15T16:05:00-05:00</updated><content type="html"><![CDATA[<p> The <a href="https://www.mysql.com">MySQL 5.0</a> release is finally adding stored procedures, triggers, and views.  This is a welcome addition.  With the strong storage management features, clustering, and replication from the 4.x releases, MySQL now has all the capabilities of an &quot;enterprise&quot; database.  (Of course, the lack of these features didn't stop thousands of users from deploying earlier versions in enterprises, even for &quot;mission-critical&quot; applications.)* </p>

<p> Here's a fairly trivial example: </p> 

<pre>
create procedure count_table_rows ()  reads sql data begin
      select table_name, table_rows from information_schema.tables;
end
</pre>

<p> * Somtime, I have to post about the perversions of language perpetrated by people in business.  &quot;Mission-critical&quot; means &quot;without this, the mission will fail.&quot;  What percentage of applications labelled as mission-critical would <em>actually</em> cause the company to fail?  Most of the time, the &quot;mission-critical&quot; label really just means &quot;this application's sponsor has large political clout&quot;. </p>

]]></content></entry><entry><title>The dumbest thing I've seen today</title><link href="https://michaelnygard.com/blog/2005/10/the-dumbest-thing-ive-seen-today/"/><id>https://michaelnygard.com/blog/2005/10/the-dumbest-thing-ive-seen-today/</id><published>2005-10-06T11:19:00-05:00</published><updated>2005-10-06T11:19:00-05:00</updated><content type="html"><![CDATA[<p> I generally like Swing, but I just found something in the Metal L&amp;F for JSlider that strikes me as a big WTF. The BasicSliderUI allows you to click in the &quot;track&quot; of the slider to scroll by a block.  That's either 10% of the span of the slider, or a minimum of 1 unit. The MetalSliderUI <em>overrides</em> that sensible behavior with a method that scrolls by just one unit.  Period. </p>

<p> Here's a quick fix: </p>  
<pre>
JSlider slider = new JSlider(); 
slider.setUI(new MetalSliderUI() {
  protected void scrollDueToClickInTrack(int dir) {
    scrollByBlock(dir);
  }
});
</pre>

]]></content></entry><entry><title>Programmer productivity measurements don't work.</title><link href="https://michaelnygard.com/blog/2005/09/programmer-productivity-measurements-dont-work./"/><id>https://michaelnygard.com/blog/2005/09/programmer-productivity-measurements-dont-work./</id><published>2005-09-08T21:36:00-05:00</published><updated>2005-09-08T21:36:00-05:00</updated><content type="html"><![CDATA[<p> Programmer productivity measurements don't work. </p><p> The most common metric was discredited decades ago, but continues to be used: KLOC. Only slightly better is function points.  At least it's tied to some deliverable value. Still, the best function point is the one you don't have to develop.  Likewise, the best line of code is the one you don't need to write. In fact, sometimes my most productive days are the ones in which I delete the most code. Why are these metrics so misleading? </p><p> Because they are <em>counting inventory as an asset</em>.  Lines of code are inventory.  Function points are inventory.  Any metric that only measures the rate of inventory production is fatally flawed.  We need metrics that measure throughput instead. </p>]]></content></entry><entry><title>More Beanshell Goodness</title><link href="https://michaelnygard.com/blog/2005/06/more-beanshell-goodness/"/><id>https://michaelnygard.com/blog/2005/06/more-beanshell-goodness/</id><published>2005-06-11T22:15:00-05:00</published><updated>2005-06-11T22:15:00-05:00</updated><content type="html"><![CDATA[<p> Thanks to the clean layered architecture in our application, we've got a very clear interface between the user interface (just Swing widgets) and the &quot;UI Model&quot;.  In the canonical MVC mode, our UI Model is part controller and part model.  It isn't the domain model, however.  It's a model of the user interface.  It has concepts like &quot;form&quot; and &quot;command&quot;.  A &quot;form&quot; is mainly a collection of property objects that are named and typed.  The UI interacts with the rest of the application by binding to the properties. </p><p> The upshot is that anything the UI can do by setting and getting properties (including executing commands via CommandProperty objects) can be done through test fixtures or automated interfaces.  Enter beanshell. </p><p> After integrating beanshell, all of our forms and properties were immediately available.  Today, I worked with one of my teammates to build a beanshell script to drive through the application.  It creates a customer and goes through the entire workflow.  Run the script a million times or so, and you've got a great pile of test data.  Schema changes?  Domain model changes?  No problem.  Just re-run the script (and wait an hour or so) and you've got updated test data. </p>]]></content></entry><entry><title>Smalltalk style prototyping for Java?</title><link href="https://michaelnygard.com/blog/2005/05/smalltalk-style-prototyping-for-java/"/><id>https://michaelnygard.com/blog/2005/05/smalltalk-style-prototyping-for-java/</id><published>2005-05-29T13:15:00-05:00</published><updated>2005-05-29T13:15:00-05:00</updated><content type="html"><![CDATA[<p> I've been eyeing <a href="http://www.beanshell.org">Beanshell</a> for some time now.  It's a very straightforward scripting language for Java.  Its syntax is about what you would expect if I said, &quot;Java with optional types and no need to pre-declare variables.&quot;  So, a Java programmer probably needs all of about thirty seconds to understand the language. </p>

<p> What I didn't expect was how quickly I could integrate it into my applications.  Here's an example.  I've got a Swing desktop application for which I wanted to add a small command shell pane.  I spent about an hour working with Swing's JTextArea and rolling my own parser.  It was late at night and I was short on bright ideas.  Finally, about the time I realized I was going to need variables and flow control, I pulled the emergency stop cord and backed up. </p>

<p> After downloading the full beanshell JAR file (about 280K) and adding it to my build path, all I had to do was this: </p>  

<pre>  
JConsole console = new JConsole();   
frame.getContentPane().add(new JScrollPane(console), BorderLayout.SOUTH);  
Interpreter interpreter = new Interpreter(console);   
new Thread(interpreter).start(); 
</pre>  

<p> Those four lines of code are literally all that is needed to get a beanshell prompt running inside of your application.  From the prompt, you have access to every class and method within your application.  If you've got some kind of object registry or namespace, or any kind of Singletons, then you'll also have access to real object instances. </p>

<p> Beanshell has a built-in command called &quot;desktop()&quot;.  The one line will launch a Smalltalk-style IDE, with class browser and interpreter window.  This desktop is still part of your application's JVM.  It lacks most of the power of Smalltalk's library, which evolved together with the workspace to support highly dynamic programming.  Nevertheless, the beanshell desktop retains the immediacy of working in Smalltalk. </p>

<p> References: </p> <ul>  <li><a href="http://www.beanshell.org/">Beanshell</a></li>  <li><a href="http://www.squeak.org/">Squeak</a> - a Free, modern Smalltalk</li>  <li><a href="http://www.programmers-friend.org/JOI/">Java Object Inspector</a> - A simple Swing-based inspector, can be launched on any Java object from a Beanshell prompt.  A great complement to Beanshell</li> </ul> 

]]></content></entry><entry><title>One of the most fun features of my current project</title><link href="https://michaelnygard.com/blog/2005/04/one-of-the-most-fun-features-of-my-current-project/"/><id>https://michaelnygard.com/blog/2005/04/one-of-the-most-fun-features-of-my-current-project/</id><published>2005-04-12T00:32:00-05:00</published><updated>2005-04-12T00:32:00-05:00</updated><content type="html"><![CDATA[<p> One of the most fun features of my current project is our &quot;extreme feedback monitor&quot;.  We're using <a href="http://cruisecontrol.sourceforge.net">CruiseControl</a> to build our entire codebase, including unit tests, acceptance tests, and quality metrics, every five minutes.  To make a broken build painfully obvious, we've got a stoplight hanging on one wall of the room.  (I may post some pictures later, if there's interest.) </p><p> Kyle Larson found the stoplight itself in a gift shop (Spencer's, maybe, I can't remember... Kyle, help me out here).  It had just one plug but you could push each light as a separate switch. </p><p> Well, it looks pretty dumb to walk over and push on the red light to show a broken build.  It's not pragmatic and it's not automated.  So, Kyle rewired it with two additional cords, so each lamp has its own plug. </p><p> I plugged each lamp into an <a href="http://www.x10.com">X10</a> lamp module so each color could be turned on and off individually.  I hooked a &quot;FireCracker&quot; wireless transmitter up to the serial port on the build box.  With one switched receiver and two lamp modules, we were ready to go. </p><p> CruiseControl supports a publisher that is supposed to integrate directly with X10 devices over the serial port.  Unfortunately, the installation and setup for Java programs to work with X10 devices on Linux is... problematic.  First off, the JavaComm API appears to be totally stagnant.  It does not support Linux at all, so you have to install the Solaris SPARC version, but supply an open-source Linux implementation of the API (www.rxtx.org), replacing a .properties file.  Then you have to make sure that the user running your build loop is a member of the &quot;tty&quot; group.  Then just cross your fingers. </p><p> I got all of the above to work from my Java test apps, but the X10 publisher built into CC still couldn't open the serial port. </p><p> I finally gave up on the built-in publisher.  I used wget, BottleRocket, and a shell script to check the build status web page every 30 seconds and change the lights accordingly. </p><p> Now, within a minute of a broken build, we can all see it.  When the light is green, the build is clean. </p><p> If the red light means &quot;broken build&quot;, and the green light means &quot;good build&quot;, you might wonder what we use yellow for. </p><p> Yellow means that someone is in the process of synchronizing and committing code.  Along with the FireCracker module, we also got a remote control.  That normally sits in the middle of the tables in the lab.  Whenever a pair needs to check in code, they grab the remote (i.e., take the semaphore) and turn on the yellow light.  As an added &quot;feature&quot;, the wireless switched receiver is the only module that makes an audible &quot;click&quot; when it switches.  We use that one to control the yellow lamp, so we also have an auditory cue when a pair starts their commit dance. </p><p> After committing, the pair turns off the yellow light and replaces the remote, thus putting the semaphore and allowing the next pair to commit.  In the event of multiple blocked pairs, FIFO behavior is not guaranteed.  Semaphore holders have been known to be susceptible to flattery and bribery. </p>]]></content></entry><entry><title>I forgot to mention that I will be speaking at OTUG</title><link href="https://michaelnygard.com/blog/2005/03/i-forgot-to-mention-that-i-will-be-speaking-at-otug/"/><id>https://michaelnygard.com/blog/2005/03/i-forgot-to-mention-that-i-will-be-speaking-at-otug/</id><published>2005-03-22T23:08:00-06:00</published><updated>2005-03-22T23:08:00-06:00</updated><content type="html"><![CDATA[<p> I forgot to mention that I will be speaking at OTUG on April 19th!  I will be speaking on &quot;Living With Systems in Production: Avoiding Heartbreak in Long-Term Relationships With Your Code&quot; </p><p> From the summary of the talk: </p> <blockquote>Everything changes after Release 1.0. One batch of consultants leave, key developers jockey to get themselves reassigned, and the free-wheeling development environment is replaced by the painful rigor of operations. Or, at least, it should be. Systems in production require a different kind of care and feeding. If you have to live with a system in production, your quality of life is largely determined by the things you put in place before Release 1.0. This talk covers the topics that will give you God-like powers over your production systems. If you are an architect or developer who has ever put a system in production--or expects to put a system into production--then this talk is for you.</blockquote> <p> Much of this will be derived from my experiences at Totality and Best Buy.  Spending time in operations gave me a great education about building systems to run, instead of building them to pass QA.</p>]]></content></entry><entry><title>Leaving AntHill for CruiseControl</title><link href="https://michaelnygard.com/blog/2005/03/leaving-anthill-for-cruisecontrol/"/><id>https://michaelnygard.com/blog/2005/03/leaving-anthill-for-cruisecontrol/</id><published>2005-03-22T22:55:00-06:00</published><updated>2005-03-22T22:55:00-06:00</updated><content type="html"><![CDATA[<p> We've been using <a href="http://www.urbancode.com/projects/anthill/default.jsp">AntHill</a> to do continuous builds.  It has served us well, but we're now moving away from it and towards <a href="http://cruisecontrol.sourceforge.net">CruiseControl</a>. </p><p> There are a few main reasons for this.  First and foremost, AntHill runs inside of Tomcat.  This is billed as a feature, but for us, it was a big problem.  There are two layers of Java containers between your OS and your build.  Trying to get environment variables (like &quot;DISPLAY=localhost:99.0&quot;) passed from an init script, to Tomcat, to AntHill, to ANT, was just becoming too burdensome. </p><p> We also experienced some serious classpath pollution.  Some things just acted differently between ANT builds on our development boxes and ANT builds on the build box.  That's unacceptable, but we found that with AntHill it was impossible to eliminate the differences.  Finally, through some jar file unpacking and decompilation, we found that our builds were picking up classes from AntHill's ow n jars. </p><p> The ability to fix these things exists, but only in AntHill Pro.  I downloaded CruiseControl today and spent an hour going through the quick start and FAQ.  At the end of it, I had our build process replicated on CruiseControl. </p><p> I did run into a problem... the checkstyle task that we have been running as part of our build all along started failing.  I assumed that it was something wrong with the build box, or with my project configuration for CruiseControl.  After half an hour or so, I ran the same build on a dev box, but <em>from the command line</em>.  It failed there, too.  It turns out that checkstyle includes a version of the Jakarta commons-collections classes that is not compatible with the Jakarta digester version that we've added to our code base. </p><p> This problem existed all along.  Running the build under CruiseControl was enough like running it from the command line that it uncovered a problem which had been present for over two weeks.  For some reason, running under AntHill never revealed this problem. </p><p> Bottom line is, a CI server needs to be as close to running a command-line build as possible.  If I have to spend time figuring out what environmental conditions the CI tool is imposing on my build, then it is defeating the purpose. </p><p> Now, I just have to figure out why in the hell checkstyle's classpath is leaking into the classpath of the code it is checking. </p>]]></content></entry><entry><title>The Veteran and the Master</title><link href="https://michaelnygard.com/blog/2005/02/the-veteran-and-the-master/"/><id>https://michaelnygard.com/blog/2005/02/the-veteran-and-the-master/</id><published>2005-02-05T22:24:00-06:00</published><updated>2005-02-05T22:24:00-06:00</updated><content type="html"><![CDATA[<p> The aged veteran said to the master, &quot;See how many programs I have written in my labors.  All of these works I have created needed no more than a text editor and a compiler.&quot; The master said, &quot;I do have an editor; indeed, I have also a compiler.&quot; </p><p> Said the aged one, &quot;Yet you shackle them within an 'environment'.  Why must your environment be integrated?  My environment has never been integrated, yet I am a mighty programmer.&quot; </p><p>The master said, &quot;You are truly a mighty programmer.  I perceive that you, in your keen intellect, can hold entire class hierarchies in mind at once.  Such abilities of apprehension are to be respected.&quot; </p><p> The veteran was well pleased and said, &quot;It is true.  Hence I am lead programmer.&quot; </p><p> The master nodded.  &quot;Sadly, I have not your powers of visualization.  I cannot hold entire hierarchies in my minds eye at once.  In my limited faculties, I must focus entirely on one class at a time.  The tool remembers the rest, as I cannot.&quot; </p><p> Emboldened, the aged veteran boasted, &quot;See the commands fly from my fingertips!  I type faster than other programmers think!&quot; </p><p> Again, the master nodded his agreement, &quot;I am not so blessed with speed as you.  It is a burden and a trial to move so slowly.  Behold, this measure of the marvel of your fingers.  Such is the flight of your keystrokes that in the time it takes you to execute a regex replace across thirty files; compile the project; note the errors; and edit the twelve files with failed replacements; I will have barely completed the 'rename refactor' which I started by typing shift-alt-r.&quot; </p><p> Brazen in his opponents weakness, the veteran cried, &quot;While you sit meditating at the green bar, I pound out another four thousand lines of code!&quot; </p><p> Again, the master nodded, &quot;Yes.  And worse, while you write the next thousand, I will surely erase a thousand more, leaving us barely past where we began.  It is clear that I cannot long contend in this field against such as yourself.&quot; </p><p> The battle-scarred veteran, his opponent beaten, laughed aloud.  Barely bothering to express his contempt, he sneered, &quot;And what fine code it is, too!  You write a fraction of the code a real programmer could produce.  As a coward in the grain, you shrink from any real challenge.  Fearing to tread where real programmers dwell, you trade in coin like a merchant, purchasing the work of others, or worse, living on the charity of those motley-clad coders who give away the fruits of their work.&quot; </p><p> &quot;Again, your perspicacity has unmasked me,&quot; said the master.  &quot;Knowing myself to produce bugs in my code, I prefer to write little of it.  I do rely upon the work of others who, if not being smarter than myself, are at least more numerous than I.  Had I your fleet fingers, I might not need to download these gifts offered by others.  Indeed, I am certain that your mighty editor would surely outpace my mere web browser, and you could then code a new SVG renderer long before I will finish downloading Batik to do the same work.  Alas, lacking your skills, I must fend for myself as best I can by reusing that which I can.  Since each line of code costs me so greatly, it behooves me to write little, and I must needs make use of what aids I can.&quot; </p><p> Shaking his head, the aged veteran stalked away, safely assured that he had gauged the so-called master truly.  He returned to his labors, building a parser for the scripting language of his workflow engine.  This would be placed inside of an application that would someday have users. </p><p> Shaking his head, the master returned his eye to the red bar of his users' new acceptance tests.  Reaching deliberately for the keyboard, he changed two methods  and added one test case.  In the serene green light of the test bar, he reflected a moment on the code he had added.  Unruffled by the staccato typing in the direction of the veteran, he renamed four fields, extracted a method, and pulled it up into a new base class.  Comforted by the tranquil green light, the master rested his hands a moment, then lifted them from the keyboard and walked away. </p><p> From the corner of his eye, the veteran observed the master leaving.  &quot;Charlatan,&quot; he snarled, as the regexes flew from his hands, long, long into the night. </p>]]></content></entry><entry><title>On Relativism and Social Constructions</title><link href="https://michaelnygard.com/blog/2005/01/on-relativism-and-social-constructions/"/><id>https://michaelnygard.com/blog/2005/01/on-relativism-and-social-constructions/</id><published>2005-01-17T18:51:00-06:00</published><updated>2005-01-17T18:51:00-06:00</updated><content type="html"><![CDATA[<p>The key operative precept of post-modernism is that all reality is a social construct.  Since no institution or normative behavior stems from natural cause, and there is no objective, external reality, then all institutions and attitudes are just social constructs.  They exist only through the agreement of the participants.   </p><p>  Nothing can be sacred, since sanctification comes from <em>outside</em>, by definition. </p><p>  If nothing is sacred, and institutions have no more reality than a children's amorphous game of ball, they deduce that any construct can be reconstructed through willful choice. </p><p>  Even if you accept the precept that there is no objective, external (let alone universal) value system, you can still see the fundamental fallacy in this thinking. </p><p>  Anyone who has ever tried to bring change into a hidebound organization knows that social constructs are far harder to change than any physical or legal structure.  You can reorganize units, bring locations together, shuffle management, or get rid of half of the people.  Still, underlying social organization will re-emerge as long as there is any vestige of continuity. </p><p>  Much of the heat energy in the ongoing culture war arises from this inertia.  Those who are so tiresomely labelled as &quot;liberal&quot;, &quot;progressive&quot;, the &quot;Left&quot;, the &quot;Cultural Elite&quot;, etc. represent a large force of people aimed at deliberately reconstructing every institution in Western life.  They have decided, based on their own feelings, bereft of natural or religious law, that any institution observed by men for more than one hundred years cannot be endured.  They are organized around the post-modern paradigm--armed with Hayakawa and Chomsky--and don't accept that some hidebound Neanderthals will not welcome forceful re-education. </p><p>  I suppose that I follow a third way.  I can agree that our institutions are social cosntructs.  That does not mean that they can, or should be, tampered with lightly.  The concept of &quot;natural law&quot; teaches that certain modes of behavior, certain morals, generate a more successful society.  Our social institutions--like marriage--have undergone the same forces of competitive pressures and differential reproduction that drive neo-Darwinian evolution.  That means the institutions we observe today--such as preserving the integrity of personal property--are the ones that <em>worked</em>. </p><p>  There is an argument to be made that I'm advocating cultural imperialism.  It could perhaps be seen that way, though such is not my intent.  Rather, just as we should justifiably be wary of changing our own genetic code, we should be wary of making large changes to our social institutions.  We do not know what will result.  There are many paths down the mountain, but only one upward.  Most random mutations result in death.  Even well-planned changes have unintended, sometimes catastrophic, effects. </p><p>  <strong>References</strong>  </p><ul> <li><a href="http://www.amazon.com/exec/obidos/tg/detail/-/0140178740/qid=1106010983/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/103-0409916-1183049?v=glance&amp;s=books&amp;n=507846">The Collapse of Chaos</a></li> <li><a href="http://www.amazon.com/exec/obidos/tg/detail/-/0195079515/qid=1106011025/sr=8-1/ref=pd_csp_1/103-0409916-1183049?v=glance&amp;s=books&amp;n=507846">The Origins of Order</a></li> <li><a href="http://www.amazon.com/exec/obidos/tg/detail/-/0316346624/qid=1106011053/sr=8-1/ref=pd_csp_1/103-0409916-1183049?v=glance&amp;s=books&amp;n=507846">The Tipping Point</a></li> <li><a href="http://www.amazon.com/exec/obidos/tg/detail/-/1578517087/qid=1106011086/sr=8-1/ref=pd_csp_1/103-0409916-1183049?v=glance&amp;s=books&amp;n=507846">The Social Life of Information</a></li> </ul>  
]]></content></entry><entry><title>An IKEA Weekend</title><link href="https://michaelnygard.com/blog/2005/01/an-ikea-weekend/"/><id>https://michaelnygard.com/blog/2005/01/an-ikea-weekend/</id><published>2005-01-11T20:34:00-06:00</published><updated>2005-01-11T20:34:00-06:00</updated><content type="html"><![CDATA[<p>I've been building a new office in my downstairs space for quite a while now. It's a &quot;weekends&quot; project for someone who doesn't have very many weekends. In early December, I broke down and hired a contractor to install the laminate (&quot;cardboard&quot;) flooring, which was the penultimate step in the master plan. </p><p>  Last comes furniture, then moving in. (Which starts the chain of dominoes, as my eldest gets the bedroom which used to be my office, then my youngest takes her spot, which makes room for the new baby. The challenge is to finish with the hole migration before the new electron gets injected. No, that wasn't a spelling error.) </p><p>  So this weekend, I had thirty-six boxes of IKEA modular furniture from &quot;Work IKEA&quot; to assemble. </p><p>  You have time to meditate on many lessons when you are assembling thirty-six boxes of IKEA modular furniture. </p><p>  For example, I've never seen a company that makes it so difficult to purchase from them. I don't really want to know that the six-shelf bookshelf I picked out from the design software actually comes as three separate SKUs. Just sell me the damn shelf. </p><p>  I shouldn't have to learn what a &quot;CDO&quot; is in order to pick out a bunch of stuff and have them deliver it on a specific day. I shouldn't have to make three trips into the store because they cannot take my credit card number over the phone. </p><p>  And can someone <span style="font-style: italic">please</span> explain why I have to <span style="font-style: italic">remove</span> items from my delivery order because the local store doesn't have them in stock? In some fields of endeavor, timing is everything, but why should I have to call them every day to find out when the left-handed tabletop comes in, then rush to the store and place my order so the piece can be pulled from inventory? </p><p>  It makes no sense to me. The whole process was implemented for the convenience of IKEA, not IKEA's customers. They've made a business decision to optimize for cost control rather than customer satisfaction. IKEA is certainly free to make that choice, and they do seem to be making profits, but I'm not likely to choose them for future furniture purchases. </p><p>  Exposing that much of your internal process to the customer--or end user--is never a good way to win the hearts and minds of your customers. </p><p>  Most of the assembly went without incident, though I was often perplexed by trying to map the low-level components into the high-level items I designed with. IKEA offers zero-cost software for download to design a floorplan with their lines, but it works at a higher level of abstraction. I was often left wondering which item a particular component was supposed to construct. </p><p>  The components were very well designed. Each piece can either fit together in only one way, or it is rotationally symmetric so either orientation works. In either case, I, the assembler, am not left with an ambiguous situation, where something might fit but does not work. </p><p>  The toughest pieces were the desks. Desks can be configured in about eighty-nine different ways. The components are all modular and generally have the same interfaces. I have a lot of flexibility at my disposal, but at the expense of complexity. A significant number of sample configurations helped me understand the complexity of options and pick a reasonable structure, but I can't help but wonder how the experience could be simplified. </p><p>  The furniture is all assembled now, and the office sits expectantly waiting for its occupant, full of unrealized potential. </p> 
]]></content></entry><entry><title>Uniting Reason and Passion</title><link href="https://michaelnygard.com/blog/2004/12/uniting-reason-and-passion/"/><id>https://michaelnygard.com/blog/2004/12/uniting-reason-and-passion/</id><published>2004-12-12T15:14:00-06:00</published><updated>2004-12-12T15:14:00-06:00</updated><content type="html"><![CDATA[<p> Reason and Passion need not conflict. Reason without passion is dusty, dry, and dead. Reason without passion leads to moral relativity. If nothing moves the thinker to passion, then all subjects are equal and without distinction. As well to discuss the economic benefits of the euthanasia of infants as the artistic merits of urinals. </p><p>  Passion without reason brings the indiscriminate energy of a summer's thunderstorm. Too much energy unbound, without direction, it's fury as constant as the winds of the air. </p><p>  Passion provides energy, the drive to accomplish, change, improve, or destroy. Reason provides direction. Reason channels Passion and achieves goals by identifying targets, foci, leverage points. Passion powers Reason. It brings motive power. Passion knows that things must be done and that change is possible. Reason knows how change may be effected. </p><p>  I was reminded of the fallacy of Passion without Reason recently. At lunch with a friend, she talked about working with a non-profit organization. Workers for non-profits epitomize those who are driven by Passion. Agree or disagree with their aims, you must admit that they earnestly mean to change the world. My friend, who comes from the profit-driven corporate world, was explaining some aspects of statistical process control and how it could be applied to improve fundraising results on their website. She was told that she needed to have more heart and feel for those unfortunates that this group helps. </p><p>  Her critic obviously felt that her approach was too analytical. Too driven by Reason, not enough Passion. In fact, the opposite was true. She was applying the combination of Reason and Passion. Passion showed her that the cause was worthy and that she could help. Reason showed her where leverage could be gained and a small effort input could result in a large change in output. </p><p>  In various disfunctional organizations which I have inhabited, I've seen many examples of the opposite. Reason reveals problems and solutions to those poor sapient cogs in the low levels of the machine. They lack the Passion to see that change is possible and so divest themselves of the power to improve their own lot in life. Problems or challenges will always overcome such people, because they give the problem power and remove it from themselves. </p><p><br /> </p> 
]]></content></entry><entry><title>More Wiki</title><link href="https://michaelnygard.com/blog/2004/12/more-wiki/"/><id>https://michaelnygard.com/blog/2004/12/more-wiki/</id><published>2004-12-10T11:47:00-06:00</published><updated>2004-12-10T11:47:00-06:00</updated><content type="html"><![CDATA[<p> My personal favorite is <a href="http://www.twiki.org/">TWiki</a>.  It has some nice features like file attachments, a great search interface, high configurability, and a rich set of available plugins (including an XP tracker plugin.) </p><p>  One cool thing about TWiki: configuration settings are accomplished through text on particular topics. For example, each &quot;web&quot; (set of interrelated topics) has a topic called &quot;WebPreferences&quot;. The text on the WebPreferences topics actually controls the variables. Likewise, if you want to set personal preferences, you set them as variables--in text--on your personal topic. It's a lot harder to describe than it is to use. </p><p>  There are some other nice features like role-based access control (each topic can have a variable that says which users or groups can modify the topic), multiple &quot;webs&quot;, and so on. </p><p>  The search interface is available as variable interpolation on a topic, so something like the &quot;recent changes&quot; topic just ends up being a date-ordered search of changes, limited to ten topics. This means that you can build dynamic views based on content, metadata, attachments, or form values. I once put a search variable on my home topic that would show me any task I was assigned to work on or review. </p><p>  I've also been looking at <a href="http://oahuwiki.sourceforge.net/">Oahu Wiki</a>. It's an open source Java wiki. It's fairly short on features at this point, but it has by far the cleanest design I've seen yet. I look forward to seeing more from this project. </p><p><br /> </p> 
]]></content></entry><entry><title>Wiki Proliferation</title><link href="https://michaelnygard.com/blog/2004/12/wiki-proliferation/"/><id>https://michaelnygard.com/blog/2004/12/wiki-proliferation/</id><published>2004-12-10T11:14:00-06:00</published><updated>2004-12-10T11:14:00-06:00</updated><content type="html"><![CDATA[<p>  Wikis have been thoroughly mainstreamed now.  You know how I can tell?  Spammers are targeting them. </p><p>  Any wiki without access control is going to get steamrolled by a bunch of Russian computers that are editing wiki pages. They replace all the legitimate content with links to porn sites, warez, viagra, get rich now, and the usual panoply of digital plaque. </p><p>  The purpose does not appear to be driving traffic directly to those sites from the wikis. Instead, they are trying to pollute Google's page rankings by creating thousands upon thousands of additional inbound links. </p><p>  If you run a wiki, be sure to enable access control and versioning (so you can recover after an attack). It is a shame that the open, freewheeling environment of the wiki has to end. It seems that the only way to preserve the value of the community is to weaken the core value of open participation that made the community worthwhile. </p><p><br /> </p> 
]]></content></entry><entry><title>Moving on</title><link href="https://michaelnygard.com/blog/2004/12/moving-on/"/><id>https://michaelnygard.com/blog/2004/12/moving-on/</id><published>2004-12-07T22:19:00-06:00</published><updated>2004-12-07T22:19:00-06:00</updated><content type="html"><![CDATA[<p> The latest in my not-exactly-daily news and commentary... </p><p>  As of December 10th, I will be leaving Totality Corporation. It has been a challenge and an education. It has also been an interesting time, as we uncovered the hidden linkages from daily activities to ultimate profitability. The managed service provider space is still new enough that the business models are not all so well-defined and understood as in consulting. I earnestly hope that I am leaving Totality in a much better place than it was when I joined. </p><p>  Still, a number of positive attractions to the new position and some negative forces away from my current position have overcome inertia. </p><p>  I will be joining <a href="http://www.atico.com/">Advanced Technologies Integration</a> as a consultant. I will be forming a team with Kyle Larson, Dale Schumacher, and Dion Stewart to do a development project for one of ATI's clients. The project itself has some moderately interesting requirements... it's not just another random commerce site. (I'm really, really bored with shopping carts!) The thing that really attracted me though, is that this is a hardcore agile methods project. We'll be using a combination of Scrum and XP. </p><p>  For a long time, I've advocated small teams of highly skilled developers. I have seen such teams produce many times the business value (and ROI) of the typical team. ATI and this client are willing to subscribe to the theory that a small, high-caliber team will outperform an army of cheap morons. </p><p>  It's going to be a blast proving them right! </p>  
]]></content></entry><entry><title>Too Much Abstraction</title><link href="https://michaelnygard.com/blog/2004/04/too-much-abstraction/"/><id>https://michaelnygard.com/blog/2004/04/too-much-abstraction/</id><published>2004-04-25T13:09:00-05:00</published><updated>2004-04-25T13:09:00-05:00</updated><content type="html"><![CDATA[<p>The more I deal with infrastructure architecture, the more I think that somewhere along the way, we have overspecialized.  There are too many architects that have never lived with a system in production, or spent time on an operations team.  Likewise, there are a lot of operations people that insulate themselves from the specification and development of systems for which they will ultimately take responsibility. </p><p> The net result is suboptimization in the hardware/software fit.  As a result, overall availability of the application suffers. </p><p> Here's a recent example. </p><p> First, we're trying to address the general issue of flowing data from production back into pre-production systems -- QA, production support, development, staging.  The first attempt took 6 days to complete.  Since the requirements of the QA environment stipulate that the data should be no more than one week out of date relative to production, that's a big problem.  On further investigation, it appears that the DBA who was executing this process spent most of the time doing <em>scp</em>s from one host to another.  It's a lot of data, so in one respect 10 hour copies are reasonable. </p><p>  But the DBA had never been told about the storage architecture.  That's the domain of a separate &quot;enterprise service&quot; group.  They are fairly protective of their domain and do not often allow their architecture documents to be distributed.  They want to reserve the right to change them at will.  Now, they will be quite helpful if you approach them with a storage problem, but the trick is knowing when you have a storage problem on your hands. </p><p>  You see, all of the servers that the DBA was copying files from and to are all on the same SAN.  An scp from one host on the SAN to another host on the SAN is pretty redundant. </p><p>  There's an alternative solution that involves a few simple steps: Take a database snapshot onto a set of disks with mirrors, split the mirrors, and join them onto another set of mirrors, then do an RMAN &quot;recovery&quot; from that snapshot into the target database.  Total execution time is about 4 hours. </p><p>  From six days to four hours, just by restating the problem to the right people. </p><p>  This is not intended to criticize any of the individuals involved.  Far from it, they are all top-notch professionals.  But the solution required merging the domains of knowledge from these two groups -- and the organizational structure explicitly discouraged that merging. </p><p>  Another recent example. </p><p>  One of my favorite conferences is the <a href="http://www.softwaresummit.com/">Colorado Software Summit</a>.  It's a very small, intensely technical crowd.  I sometimes think half the participants are also speakers.  There's a year-round mailing list for people who are interested in, or have been to, the Summit.  These are <em>very</em> skilled and talented people.  This is easily the top 1% of the software development field. </p><p>  Even there, I occasionally see questions about how to handle things like transparent database connection failover.  I'll admit that's not exactly a journeyman topic.  Bring it up at a party and you'll have plenty of open space to move around in.  What surprised me is that there are some fairly standard infrastructure patterns for enabling database connection failover that weren't known to people with decades of experience in the field.  (E.g., cluster software reassigns ownership of a virtual IP address to one node or the other, with all applications using the virtual IP address for connections). </p><p>  This tells me that we've overspecialized, or at least, that the groups are not talking nearly enough.  I don't think it's possible to be an expert in high availability, infrastructure architecture, enterprise data management, storage solutions, OOA/D, web design, and network architecture.  Somehow, we need to find an effective way to create joint solutions, so we don't have software being developed that's completely ignorant of its deployment architecture, nor should we have infrastructure investments that are not capable of being used by the software.  We need closer ties between operations, architecture, and development. </p>  
]]></content></entry><entry><title>The Lights Are On, Is Anybody Home?</title><link href="https://michaelnygard.com/blog/2003/05/the-lights-are-on-is-anybody-home/"/><id>https://michaelnygard.com/blog/2003/05/the-lights-are-on-is-anybody-home/</id><published>2003-05-01T12:18:56-05:00</published><updated>2003-05-01T12:18:56-05:00</updated><content type="html"><![CDATA[<p>
We pay a lot of attention to stakeholders when we create systems. The end users get a say, as do the Gold Owners. Analysts put their imprimatur on the requirements. In better cases, operations and administration adds their own spin. It seems like the only group that doesn't have any input during requirements gathering is the development team itself. That is truly unfortunate. 
</p><p>
Not even the users will have to live with the system more than the developers will. Developers literally inhabit the system for most of their waking hours, just as much (or maybe more) than they inhabit their cubes or offices. When the code is messy, nobody suffers more than the developers. When living in the system becomes unpleasant, morale will suffer. Any time you hear a developer ask for a few weeks of &quot;cleanup&quot; after a release, what they are really saying is, &quot;This room is a terrible mess. We need to remodel.&quot; 
</p><p>
A code review is just like an episode of &quot;Trading Spaces&quot;. Developers get to trade problems for a while, to see if somebody else can see possibilities in their dwelling. Rip out that clunky old design that doesn't work any more! Hang some fabric on the walls and change the lighting. 
</p><p>
Whether your virtual working environment becomes a cozy place, a model of efficiency, or a cold, drab prison, you create your own living space. It is worth taking some care to create a place you enjoy inhabiting. You will spend a lot of time there before the job is done.
</p>
 
]]></content></entry><entry><title>Don't Build Systems That Boink</title><link href="https://michaelnygard.com/blog/2003/04/dont-build-systems-that-boink/"/><id>https://michaelnygard.com/blog/2003/04/dont-build-systems-that-boink/</id><published>2003-04-01T16:00:04-06:00</published><updated>2003-04-01T16:00:04-06:00</updated><content type="html"><![CDATA[<p><i>Note: This piece originally appeared in the &quot;Marbles Monthly&quot; newsletter in April 2003</i></p>

<p>
I caught an incredibly entertaining special on The Learning Channel last week.  A bunch of academics decided that they were going to build an authentic Roman-style catapult, based on some ancient descriptions.  They had great plans, engineering expertise, and some really dedicated and creative builders.  The plan was to hurl a 57 pound stone 400 yards, with a machine that weighed 30 tons.  It was amazing to see the builders faces swing between hope and fear.  The excitement mingled with apprehension.  
</p>

<p>
At one point, the head carpenter said that it would be wonderful to see it work, but "I'm fairly certain it's going to boink."  I immediately knew what he meant.  "Boink" sums up all the myriad ways this massive device could go horribly wrong and wreak havoc upon them all.  It could fall over on somebody.  It could break, releasing all that kinetic energy in the wrong direction, or in every direction.  The ball could fly off backwards.  The rope might relax so much that it just did nothing.  One of the throwing arms could break.  They could both break.  In other words, it could do anything other than what it was intended to do.
</p>

<p>
That sounds pretty familiar.  I see the same expressions on my teammates' faces every day.  This enormous project we're slaving on could fall over and crush us all into jelly.  It could consume our hours, our minds, and our every waking hour.  Worst case, it might cost us our families, our health, our passion.  It could embarrass the company, or cost it tons of money.  In fact, just about the most benign thing it could do is nothing.
</p>

<p>
So how do you make a system that don't boink?  It is hard enough just making the system do what it is supposed to.  The good news is that some simple "do's and don'ts" will take us a long way toward non-boinkage.
</p>


<p><b>Automation is Your Friend #1: Runs lots of tests -- and run them all the time </b></p>
<p>
Automated unit tests and automated functional tests will guarantee that you don't backslide.  They provide concrete evidence of your functionality, and they force you to keep your code integrated.

<p><b>Automation is Your Friend #2: Be fanatic about build and deployment processes </b></p>
<p>
A reliable, fully automated build process will prevent headaches and heartbreaks.  A bad process--or a manual process--will introduce errors and make it harder to deliver on an iterative cycle.
</p>

<p>
Start with a fully automated build script on day one.  Start planning your first production-class deployment right away, and execute a deployment within the first three weeks.  A build machine (it can be a workstation) should create a complete, installable numbered package.  That same package should be delivered into each environment.  That way, you can be absolutely certain that QA gets exactly the same build that went into integration testing.  
</p>

<p>
Avoid the temptation to check out the source code to each environment.  An unbelievable amount of downtime can be traced to a version label being changed between when the QA build and the production build got done.
</p>

<p><b>Everything In Its Place </b></p>
<p>Keep things separated that either change at different speeds.  Log files change very fast, so isolate them.  Data changes a little less quickly but is still dynamic.  "Content" changes slower yet, but is still faster than code.  Configuration settings usually come somewhere between code and content.  Each of these things should go in their own location, isolated and protected from each other.
</p>


<p><b>Be transparent </b></p>
<p>Log everything interesting that happens.  Log every exception or warning.  Log the start and end of long-running tasks.  Always make sure your logs include a timestamp!
</p>

<p>
Be sure to make the location of your log files configurable.  It's not usually a good idea to keep log files in the same filesystem as your code or data.  Filling up a filesystem with logs should not bring your system down.
</p>

<p><b>Keep your configuration out of your code </b></p>
<p>
It is always a good idea to separate metadata from code.  This includes settings like host names, port numbers, database URLs and passwords, and external integrations.
</p>

<p>
A good configuration plan will allow your system to exist in different environments -- QA versus production, for example.  It should also allow for clustered or replicated installations.
</p>

<p><b>Keep your code and your data separated </b></p>
<p>
The object-oriented approach is a good wasy to build software, but it's a lousy way to deploy systems.  Code changes at a different frequency than data.  Keep them separated.  For example, in a web system, it should be easy to deploy a new code drop without disrupting the content of the site.  Likewise, new content should not affect the code.
</p>                                
 
]]></content></entry><entry><title>Plugging the Marbles Newsletter</title><link href="https://michaelnygard.com/blog/2003/03/plugging-the-marbles-newsletter/"/><id>https://michaelnygard.com/blog/2003/03/plugging-the-marbles-newsletter/</id><published>2003-03-24T21:34:00-06:00</published><updated>2003-03-24T21:34:00-06:00</updated><content type="html"><![CDATA[<p>Not too much going on here lately.  Most of my waking hours have been billable for the past few months.  That's good and bad, in so many different ways.</p>  <p>Most of my recent writing has been for the <a href="http://www.marblesit.com/">Marbles, Inc.</a> monthly newsletter.</p><p><em>Dec 2006 Edit: Marbles IT has not been a going concern for some time.&nbsp; My articles for the Marbles Monthly newsletter are now available under the Marbles category of this blog.</em> <br /></p><br /> 
]]></content></entry><entry><title>Multiplier Effects</title><link href="https://michaelnygard.com/blog/2003/02/multiplier-effects/"/><id>https://michaelnygard.com/blog/2003/02/multiplier-effects/</id><published>2003-02-01T10:53:11-06:00</published><updated>2003-02-01T10:53:11-06:00</updated><content type="html"><![CDATA[<p>
Here's one way to think about the ethics of software, in terms of multipliers. Think back to the last major email virus, or when the movie "The Two Towers" was released.  No doubt, you heard or read a story about how much lost productivity this bane would cause.  There is always some analyst willing to publish some outrageous estimate of damages due to these intrusions into the work life.  I remember hearing about the millions of dollars supposedly lost to the economy when Star Wars Episode I was released.
</p><p>
(By the way, I have to take a minute to disassemble this kind of analysis. Stick with me, this won't take long.
</p><p>
If you take 1.5 seconds to delete the virus, it costs nothing. It's an absolutely immeasurable impact to your day. It won't even affect your productivity. You will probably spend more time than that discussing sports scores, going to the bathroom, chatting with a client, or any of the hundreds of other things human beings do during a day. It's literally lost in the noise. Nevertheless, some analyst who likes big numbers will take that 1.5 seconds and multiply it by the millions of other users and their 1.5 seconds, then multiply that by the "national average salary" or some such number.
</p><p>
So, even though it takes you longer to blow your nose than to delete the virus email, somehow it still ends up "costing the economy" 5x10^6 USD in "lost productivity". The underlying assumptions here are so flawed that the result cannot be taken seriously.  Nevertheless, this kind of analysis will be dragged out every time there's a news story--or better yet, a trial--about an email worm.)
</p><p>
The real moral of this story isn't about innumeracy in the press, or spotlight seekers exploiting said innumeracy. It's about multipliers, and the very real effect they can have.
</p><p>
Suppose you have a decision to make about a particular feature. You can do it the easy way in about a day, or the hard way in about a week. (Hypothetical.)  Which way should you do it? Suppose that the easy way makes four new fields required, whereas doing it the hard way makes the program smart enough to handle incomplete data.  Which way should you do it?
</p><p>
Required fields seem innocuous, but they are always an imposition on the user.  They require the user to gather more information before starting their jobs.  This in turn often means they have to keep their data on Post-It notes until they are ready to enter it, resulting in lost data, delays, and general frustration.
</p><p>
Let's consider an analogy. Suppose I'm putting a sign up on my building. Is it OK to mount the sign six feet up on the wall, so that pedestrians have to duck or go around it? It's much easier for me to hang the sign if I don't have to set up a ladder and scaffold. It's only a minor annoyance to the pedestrians. It's not like it would block the sidewalk or anything. All they have to do is duck.  So, I get to save an hour installing the sign, at the expense of taking two seconds away from every pedestrian passing my store.  Over the long run, all of those two second diversions are going to add up to many, many times more than the hour that I saved.
</p><p>
It's not ethical to worsen the lives of others, even a small bit, just to make things easy for yourself.  Successful software is measured in millions of people.  Every requirements decision you make is an imposition of your will on your users' lives, even if it is a tiny one.  Always be mindful of the impact your decisions--even small ones--have on those people. You should be willing to bear large burdens to ease the burden on those people, even if your impact on any given individual is miniscule. 
</p>
                                 
]]></content></entry><entry><title>Keep Your Secrets</title><link href="https://michaelnygard.com/blog/2002/12/keep-your-secrets/"/><id>https://michaelnygard.com/blog/2002/12/keep-your-secrets/</id><published>2002-12-30T22:10:00-06:00</published><updated>2002-12-30T22:10:00-06:00</updated><content type="html"><![CDATA[<p>Here's a system I call &quot;KeepYourSecrets.org&quot;.  Recall a film noir detective telling the criminal mastermind that unless he drops a postcard in the mail in the next three days, all the details will go straight to the newspaper.</p>  <p>You can upload any kind of file -- it's all treated like binary.  You can set some parameters like a distribution list and a checkin frequency.  The system uses an IRC-like network to split your file in <em>n</em> parts, of which some <em>k</em> parts are needed to re-create the original.  Up to <em>n-k</em> parts can be lost or compromised without losing or compromising the whole.  (See &quot;Applied Cryptography&quot; for details.)  With lots of hosts, you can split a document into multiple overlapping sets of pieces to provide another layer of resiliency against damage.</p>  <p>From then on, if you do not check in with the network on some periodic basis, the document goes out to the distribution list.  NYTimes, Washington Post, CIA, whoever is on the distribution list for your file.</p>  <p>The network of server don't ever have to know who you are.  They just need to know that you hold the private key that matches the public key that was used to upload the package.</p>  <p>It's possible to construct voting algorithms that the servers can use to decide if you have really checked in or not.  This lets the network protect against a single compromised or hostile host.  (You have to be resilient against hostile implementations.)</p>  <p>Because the hosts all communicate via some pub/sub or relay-chat protocol (Jabber, maybe?), the networks of hosts can be self-forming and self-identifying.  If there is no central point of control, then the network as a whole cannot be stopped, subverted or forced to give up secrets by any single agency.</p>  <p>What you end up with is a secure, anonymous drop box that cannot be blocked, traced, or inflitrated.  It is self-forming and highly resilient to the loss of constituent pieces.</p> -------- 
]]></content></entry><entry><title>The Paradox of Honor</title><link href="https://michaelnygard.com/blog/2002/10/the-paradox-of-honor/"/><id>https://michaelnygard.com/blog/2002/10/the-paradox-of-honor/</id><published>2002-10-22T21:59:00-05:00</published><updated>2002-10-22T21:59:00-05:00</updated><content type="html"><![CDATA[<p>You can use a person's honor against him only if he values honor.  Only the honest man is threatened by the pointed finger.  The liar is unaffected by that kind of accusation.  I think it is because there is no such thing as &quot;dishonesty&quot;.  There is only honesty or it's lack.  Not a thing and it's opposite, but a thing and it's absence.  One or zero, not one or minus-one.  One who is lacking a thing cannot be threatened at the prospect of its loss.</p><br /> 
]]></content></entry><entry><title>I think I'd like to</title><link href="https://michaelnygard.com/blog/2002/10/i-think-id-like-to/"/><id>https://michaelnygard.com/blog/2002/10/i-think-id-like-to/</id><published>2002-10-22T21:47:00-05:00</published><updated>2002-10-22T21:47:00-05:00</updated><content type="html"><![CDATA[<p>I think I'd like to do some Smalltalk (or Squeak) development sometime.  Just for myself.  It would be good for me -- like an artist going to a retreat and setting aside all notions of practicality.  I know I'll never work in Squeak professionally.  That's why it would be like saying to yourself, &quot;In this now, purity of expression is all that matters.  Tomorrow, I will worry about making something I can sell.  Tomorrow I will design so the mediocre masses that follow me cannot corrupt it.  Today, I will work for the joy I find in the work.&quot; </p><p> The burdens of responsibility leave no room for such indulgence.  So I turn back to Java and C#.  I'll write another Address class and deal with another session manager, and more cookies.  Always with the cookies.</p><br /> 
]]></content></entry><entry><title>Nostalgia</title><link href="https://michaelnygard.com/blog/2002/09/nostalgia/"/><id>https://michaelnygard.com/blog/2002/09/nostalgia/</id><published>2002-09-16T13:19:00-05:00</published><updated>2002-09-16T13:19:00-05:00</updated><content type="html">&lt;p>This kind of thing makes me wish I were back at Caltech.&lt;/p>&lt;br /></content></entry><entry><title>Bill Joy Knocks the Open Source Business Model</title><link href="https://michaelnygard.com/blog/2002/08/bill-joy-knocks-the-open-source-business-model/"/><id>https://michaelnygard.com/blog/2002/08/bill-joy-knocks-the-open-source-business-model/</id><published>2002-08-16T11:29:00-05:00</published><updated>2002-08-16T11:29:00-05:00</updated><content type="html"><![CDATA[<p>Bill Joy <a href="http://zdnet.com.com/2100-1104-949812.html">had some doubts</a> to voice about Linux. Of course, like so many others he immediately jumps to the wrong conclusion. &quot;The open-source business model hasn't worked very well,&quot; he says.</p> <p> Tough nuts. Here's the point that seems to get missed over and over again. There is no &quot;open source business model&quot;. There never was, and I doubt there ever will be. It doesn't exist. It's a contradiction in terms. </p><p> Open source needs no business model. </p><p> Look, GNU existed before anyone ever talked about &quot;open source&quot;.  Linux was built before there were <em>companies</em> like RedHat and IBM interested (let alone Sun). The thing that the corps and the pundits cannot seem to grasp is their absolute <em>irrelevance</em>. </p><p> It's like Bruce Sterling's speech. Harangue. Whatever you want to call it. I see it as yet another person getting up and trying to tell the &quot;open-source community&quot; what they need to do. Getting on their case about not being organized enough... or something. </p><p> Or it's like those posters on Slashdot that wish either GNOME or KDE would shut down so everyone can focus on one &quot;standard&quot; desktop. </p><p> Or Scott McNealy, lamenting the fact that open source Java application servers inhibit the expenditure of dollars that could be used to market J2EE against .Net. </p><p> Or the UI designers who froth at the mouth about how terrible an open source applications user interface may be. They say moronic things like &quot;when will coders learn that they shouldn't design user interfaces?&quot; (Or the more extreme form, &quot;Programmers should never design UIs.&quot;) </p><p> Or it's like anyone who looks at an application and says, &quot;That's pretty good.  You know what you really need to do?&quot; </p><p> All of these people don't get the true point.  I'll say it here as baldly as I can. </p><p>There is nobody in charge.  Not IBM, not Linus Torvalds, not Richard Stallman.  Nobody. </p><p> All you will find is an anarchic collection of self-interested individuals. Sometimes they collaborate. Some of them work together, some work apart, some work against each other. To the extent that some clusters of individuals share a vision, they collaborate to tackle bigger, cooler projects. </p><p> There is no one in control. Nobody gets to decree what open source projects live or die, or what direction they go in. These projects are an expression of free will, created by those capable of expressing themselves in that medium. Decisions happen in code, because coders make them happen. </p><p> Free will, baby. It's my project, and I'll do what I want with it. If I want to create the most god-awful user interface ever seen by Man, that's my perogative. (If I want lots of users, I probably won't do that, but who says I have to want lots of users? It's my choice!) </p><p> As long as one GNOME hacker wants to keep working on GNOME, it will continue to evolve. As long as one Linux kernel hacker keeps coding, Linux will continue. None of these things require corporations, IPOs, or investement dollars to continue. The only true investments in open source are time and brainpower. Money is useful in that it can be used to purchase time, the greatest gift you can give a coder. Corporations are useful in that they are effective at aggregating and channeling money. &quot;Useful&quot;, not &quot;required&quot;. </p><p> As long as coders have free will and the tools to express it, open source software will continue. In fact, even if you take away their tools, they'll build new ones! To truly kill open source software, you must kill free will itself. </p><p> (And, by the way, <a href="http://www.cl.cam.ac.uk/%7Erja14/tcpa-faq.html">there</a> are those who <a href="http://www.dvdcca.org/">want</a>  to<a href="http://www.mpaa.org/"> do</a> exactly that.)  </p>  
]]></content></entry><entry><title>Needles, Haystacks</title><link href="https://michaelnygard.com/blog/2002/07/needles-haystacks/"/><id>https://michaelnygard.com/blog/2002/07/needles-haystacks/</id><published>2002-07-22T22:09:00-05:00</published><updated>2002-07-22T22:09:00-05:00</updated><content type="html"><![CDATA[<p>So, this may seem a little off-topic, but it comes round in the end.  Really, it does.</p> <p> I've been aggravated with the way members of the fourth estate have been treating the supposed &quot;information&quot; that various TLAs had before the September 11 attacks.  (That used to be my birthday, by the way.  I've since decided to change it.)  We hear that four of five good bits of information scattered across the hundreds of FBI, CIA, NSA, NRO, IRS, DEA, INS, or IMF offices &quot;clearly indicate&quot; that terrorists were planning to fly planes into buildings.  Maybe so.  Still, it doesn't take a doctorate in complexity theory to figure out that you could probably find just as much data to support any conclusion you want.  I'm willing to bet that if the same amount of collective effort were invested, we could <strong>prove</strong> that the U. S. Government has evidence that Saddam Hussein and aliens from Saturn are going to land in Red Square to re-establish the Soviet Union and launch missiles at Guam. </p><p> You see, if you already have the conclusion in hand, you can sift through mountain ranges of data to find those bits that best support your conclusion.  That's just hindsight.  It's only good for gossipy hens clucking over the backyard fence, network news anchors, and not-so-subtle innuendos by Congresscritters.   </p><p> The trouble is, it doesn't work in reverse.  How many documents does just the FBI produce every day?  10,000?  50,000?  How would <em>anyone</em> find exactly those five or six documents that really matter <em>and ignore all of the chaff</em>?  That's the job of analysis, and it's damn hard.  A priori, you could only put these documents together and form a conclusion through sheer dumb luck.  No matter how many analysts the agencies hire, they will always be crushed by the tsunami of data. </p><p> Now, I'm not trying to make excuses for the alphabet soup gang.  I think they need to reconsider some of their basic operations.  I'll leave questions about separating counter-intelligence from law enforcement to others.  I want to think about harnessing randomness.  You see, government agencies are, by their very nature, bureaucratic entities.  Bureaucracies thrive on command-and-control structures.  I think it comes from protecting their budgets.  Orders flow down the hierarchy, information flows up.  Somewhere, at the top, an omniscient being directs the whole shebang.  A command-and-control structure hates nothing more than randomness.  Randomness is noise in the system, evidence of an inadequate procedures.  A properly structured bureaucracy has a big, fat binder that defines who talks to whom, and when, and under what circumstances. </p><p> Such a structure is perfectly optimized to ignore things.  Why?  Because each level in the chain of command has to summarize, categorize, and condense information for its immediate superior.  Information is lost at every exchange.  Worse yet, the chance for somebody to see a pattern is minimized.  The problem is this whole idea that information flows toward a converging point.  Whether that point is the head of the agency, the POTUS, or an army of analysts in Foggy Bottom, they cannot assimilate everything.  There isn't even any way to build information systems to support the mass of data produced every day, let alone correlating reports over time. </p><p> So, how do Dan Rather and his cohorts find these things and put them together?  Decentralization.  There are hordes of pit-bull journalists just waiting for the scandal that will catapult them onto CNN.  (&quot;Eat your heart out Wolf, <strong>I</strong> found the smoking gun first!&quot;)   </p><p> Just imagine if every document produced by the Minneapolis field office of the FBI were sent to every other FBI agent and office in the country.  A vast torrent of data flowing constantly around the nation.  Suppose that an agent filing a report about suspicious flight school activity could correlate that with other reports about students at other flight schools.  He might dig a little deeper and find some additional reports about increased training activity, or a cluster of expired visas that overlap with the students in the schools.  In short, it would be a lot easier to correlate those random bits of data to make the connections.  Humans are amazing at detecting patterns, but they have to see the data first! </p><p> This is what we should focus on.  Not on rebuilding the $6 Billion Bureaucracy, but on finding ways to make available all of the data collected today.  (Notice that I haven't said <strong>anything</strong> that requires weakening our 4th or 5th Amendment rights.  This can all be done under laws that existed <strong>before</strong> 9/11.)  Well, we certainly have a model for a global, decentrallized document repository that will let you search, index, and correlate all of its contents.  We even have technologies that can induce membership in a set.  I'd love to see what Google Sets would do with the 19 hijackers names, after you have it index the entire contents of the FBI, CIA, and INS databases.  Who would it nominate for membership in <strong>that</strong> set? </p><p> Basically, the recipe is this: move away from ill-conceived ideas about creating a &quot;global clearinghouse&quot; for intelligence reports.  Decentralize it.  Follow the model of the Internet, Gnutella, and Google.  Maximize the chances for field agents and analysts to be exposed to that last, vital bit of data that makes a pattern come clear.  Then, when an agent perceives a pattern, make damn sure the command-and-control structure is ready to respond.</p>  
]]></content></entry><entry><title>MLP</title><link href="https://michaelnygard.com/blog/2002/07/mlp/"/><id>https://michaelnygard.com/blog/2002/07/mlp/</id><published>2002-07-11T15:16:00-05:00</published><updated>2002-07-11T15:16:00-05:00</updated><content type="html">&lt;p>Here's a good roundup of recent traffic regarding REST.&lt;/p>&lt;br /></content></entry><entry><title>Here's my number one frustration</title><link href="https://michaelnygard.com/blog/2002/06/heres-my-number-one-frustration/"/><id>https://michaelnygard.com/blog/2002/06/heres-my-number-one-frustration/</id><published>2002-06-24T12:44:00-05:00</published><updated>2002-06-24T12:44:00-05:00</updated><content type="html"><![CDATA[<p>Here's my number one frustration with the state of the industry today. I am a professional.  I regard my work as a craft to be studied and learned. Yet, in most domains, there is no benefit to developing a high level of skill.  You end up surrounded by people who don't understand a word you say, can't work at that level, and don't really give a damn. They'll get the same rewards and go home happy at 5:00 every day. It's like, once you achieve a base level of mediocrity, there's no benefit for further personal development. In fact, there's a distinct <strong>disadvantage</strong>, in that you end up pulling ridiculous hours to clean up their garbage.</p>  <p>Bah, there I go being bitter again. Maybe I just need to work in some other domain--one where skills count for something, and being good at your job is a benefit, not a hindrance. I'm sick of writing Address classes, anyway.  </p><br /> 
]]></content></entry><entry><title>Multiplier Effects</title><link href="https://michaelnygard.com/blog/2002/06/multiplier-effects/"/><id>https://michaelnygard.com/blog/2002/06/multiplier-effects/</id><published>2002-06-08T22:28:00-05:00</published><updated>2002-06-08T22:28:00-05:00</updated><content type="html"><![CDATA[<p>Here's another way to think about the ethics of software, in terms of multipliers.  Think back to the last major virus scare, or when Star Wars Episode II was released.  Some &quot;analyst&quot;--who probably found his certificate in a box of Cracker Jack--publishing some ridiculous estimate of damages.</p><p> BTW, I have to take a minute to disassemble this kind of analysis.  Stick with me, it won't take long.</p><p> If you take 1.5 seconds to delete the virus, it costs nothing.  It's an absolutely immeasurable impact to your day.  It won't even affect your productivity.  You will probably spend more time than that discussing sports scores, going to the bathroom, chatting with a client, or any of the hundreds of other things human beings do during a day.  It's literally lost in the noise.  Nevertheless, some peabrain analyst who likes big numbers will take that 1.5 seconds and multiply it by the millions of other users and their 1.5 seconds, then multiply that by the &quot;national average salary&quot; or some such number. </p><p> So, even though it takes you longer to blow your nose than to delete the virus email, somehow it still ends up &quot;costing the economy&quot; 5x10^6 USD in &quot;lost productivity&quot;.  The underlying assumptions here are so thoroughly rotten that the result cannot be anything but a joke.  Sure as hell though, you'll see this analysis dragged out every time there's a news story--or better yet, a trial--about an email worm.</p> <p>The real moral of this story isn't about innumeracy in the press, or spotlight seekers exploiting innumeracy.  It's about multipliers.</p> <p>Suppose you have a decision to make about a particular feature.  You can do it the easy way in about a week, or the hard way in about a month.  (Hypothetical.)  Which way should you do it?  Suppose that the easy way makes the user click an extra button, whereas doing it the hard way makes the program a bit smarter and saves the user one click.  Just one click.  Which way should you do it?</p> <p>Let's consider an analogy.  Suppose I'm putting a sign up on my building.  Is it OK to mount the sign six feet up on the wall, so that pedestrians have to duck or go around it?  It's much easier for me to hang the sign if I don't have to set up a ladder and scaffold.  It's only a minor annoyance to the pedestrians.  It's not like it would block the sidewalk or anything.  All they have to do is duck.  (We'll just ignore the fact that pissing off all your potential customers is not a good business strategy.)</p> <p>It's not ethical to worsen the lives of others, even a small bit, just to make things easy for yourself.  These days, successful software is measured in millions of users, of people.  Always be mindful of the impact your decisions--even small ones--have on those people.  Accept large burdens to ease the burden on those people, even if your impact on any given individual is miniscule.  The cumulative good you do that way will always overwhelm the individual costs you pay.</p>  
]]></content></entry><entry><title>REST and Change in APIs</title><link href="https://michaelnygard.com/blog/2002/05/rest-and-change-in-apis/"/><id>https://michaelnygard.com/blog/2002/05/rest-and-change-in-apis/</id><published>2002-05-14T11:34:00-05:00</published><updated>2002-05-14T11:34:00-05:00</updated><content type="html"><![CDATA[<p>In case it didn't come through, I'm intrigued by REST, because it seems more fluid than the WS-* specifications.  I can do an HTTP request in about 5 lines of socket code in <strong>any</strong> modern language, from <strong>any</strong> client device.</p>  <p>The WS-splat crowd seem to be building YABS (yet another brittle standard).  Riddle me this: what use is a service description in a standardized form if there is only one implementor of that service?  WSDL only attains full value when there are standards built <em>on top of WSDL</em>.  Just like XML, WSDL is a meta-standard.  It is a standard for specifying other standards.  Collected and diverse industry behemoths and leviathans make the rules for <em>that</em> playground.</p> <p>I see two, equally likely, outcomes for any given service definition:</p>  <ul> <li>A defining body will standardize the interface for a particular web service.  This will take far too long.</li> <li>A dominant company in a star-like topography with its customers and suppliers (think Wal-mart) will impose an interface that its business partners must use.</li> </ul>  <p>Once such interfaces are defined, how easily might they be changes?  I mean the WSDL (or other) definition of the service itself.  Can anyone say CORBAservices? You'd better define your services right the first time, because there appears to be substantial friction opposing change.</p>  <p>How does REST avoid this issue?  By eliminating layers.  If I support a URI naming scheme like http://<em>company</em>.com/<em>groupName</em>/<em>divisionName</em>/<em>departmentName</em>/purchaseOrders/<em>poNumber</em> as a RESTful way to access purchase orders, and I find that we need to change it to /purchaseOrders/<em>departmentNumber</em>/<em>poNumber</em>, then both forms can co-exist.  The alternative change in SOAP/WSDL-land would either modify the original endpoint (an incompatible change!) or would define a <em>new</em> service to support the new mode of lookup.  (I suppose other hacks are available, too.  Service.getPurchaseOrder2() or Service.getPurchaseOrderNew() for example.)</p>  <p>Of course, neither of these service architectures are implemented widely enough to really evaluate which one will be more accepting of change.  I can tell you, though, that one of the huge CORBA-killers was the slow pace and resistance to change in the CORBAservices.</p>  
]]></content></entry><entry><title>Here's another excellent discussion about</title><link href="https://michaelnygard.com/blog/2002/05/heres-another-excellent-discussion-about/"/><id>https://michaelnygard.com/blog/2002/05/heres-another-excellent-discussion-about/</id><published>2002-05-09T10:50:00-05:00</published><updated>2002-05-09T10:50:00-05:00</updated><content type="html"><![CDATA[<p><a href="http://www.xml.com/pub/a/2002/05/08/deviant.html">Here's</a> another excellent discussion about REST for web services.</p><br /> 
]]></content></entry><entry><title>Debating "Web Services"</title><link href="https://michaelnygard.com/blog/2002/05/debating-web-services/"/><id>https://michaelnygard.com/blog/2002/05/debating-web-services/</id><published>2002-05-07T21:30:00-05:00</published><updated>2002-05-07T21:30:00-05:00</updated><content type="html"><![CDATA[<p>There is a huge and contentious debate under way right now related to &quot;Web services&quot;.  A sizable contingent of the W3C and various XML pioneers are challenging the value of SOAP, WSDL, and other &quot;Web service&quot; technology.</p>  <p>This is a nuanced discussion with many different positions being taken by the opponents.  Some are critical of the W3C's participation in something viewed as a &quot;pay to play&quot; maneuver from Microsoft and IBM.  Others are pointing out serious flaws in SOAP itself.  To me, the most interesting challenge comes from the W3C's Technical Architecture Group (TAG).  This is the group tasked with defining what the web is and is not.  Several of the TAG, including the president of the Apache Foundation, are arguing that &quot;Web services&quot; as defined by SOAP, fundamentally are not &quot;the web&quot;.  (&quot;The web&quot; being defined crudely as &quot;things are named via URI's&quot; and &quot;every time I ask for the same URI, I get the same results&quot;.  My definition, not theirs.)  With a &quot;Web service&quot;, a URI doesn't name a thing, it names a process.  What I get when I ask for a URI is no longer dependent solely on the state of the thing itself.  Instead, what I get depends on my path through the application.</p>  <p>I'd encourage you to all sample this debate, as summarized by <a href="http://www.advogato.com/article/464.html">Simon St. Laurent</a> (one of the original XML designers).</p><br /> 
]]></content></entry><entry><title>Decoupling</title><link href="https://michaelnygard.com/blog/2002/05/decoupling/"/><id>https://michaelnygard.com/blog/2002/05/decoupling/</id><published>2002-05-06T09:46:00-05:00</published><updated>2002-05-06T09:46:00-05:00</updated><content type="html">&lt;p>For the ultimate in temporal, architectural, language and spatial decoupling, try two of my favorite fluid technologies: publish-subscribe messaging and tuple-spaces.&lt;/p>&lt;br /></content></entry><entry><title>Prison of our own Making</title><link href="https://michaelnygard.com/blog/2002/04/prison-of-our-own-making/"/><id>https://michaelnygard.com/blog/2002/04/prison-of-our-own-making/</id><published>2002-04-19T21:49:00-05:00</published><updated>2002-04-19T21:49:00-05:00</updated><content type="html">&lt;p>We who build worlds dwell in a dank and dismal prison of our own construction, though not our design. Why so difficult? Where is the green grass? Where is the sunshine?&lt;/p>&lt;br /></content></entry><entry><title>Ethical decisions in software development</title><link href="https://michaelnygard.com/blog/2002/04/ethical-decisions-in-software-development/"/><id>https://michaelnygard.com/blog/2002/04/ethical-decisions-in-software-development/</id><published>2002-04-04T22:19:00-06:00</published><updated>2002-04-04T22:19:00-06:00</updated><content type="html"><![CDATA[<p>Ethical decisions in software development do not only arise when we are talking about malware or copyright infringement.</p>
<p>If my programs are successful, then they impact the lives of thousands or millions of people.  That impact can be positive or negative.  The program can make their lives better or worse&ndash;even if just in minute proportions.</p>
<p>Every time I make a decision about how a program behaves, I am really deciding what my users can and cannot do.  If I make an input required, I am forcing them to abide by my rules.  (Hopefully, it is a rule they expressed first, at least.)  Conversely, if I allow partial entry, then I am allowing some licentiousness.  They can get away with less rigorous work.</p>
<p>That makes every programming decision an <em>ethical</em> decision.</p>
]]></content></entry><entry><title>Designing for Emergent Behavior</title><link href="https://michaelnygard.com/blog/2002/03/designing-for-emergent-behavior/"/><id>https://michaelnygard.com/blog/2002/03/designing-for-emergent-behavior/</id><published>2002-03-25T22:44:00-06:00</published><updated>2002-03-25T22:44:00-06:00</updated><content type="html"><![CDATA[<p>Lately, I&rsquo;ve been grooving on emergent behavior.  This fuzzy term comes from the equally fuzzy field of complexity studies.  Mix complex rules together with non-linear effects (like humans) and you are likely to observe emergent behavior.</p>
<p>Recent example: web browser security holes.  Any program inherently constitutes a complex system.  Add in some dynamic reprogramming, downloadable code, system-level scripting, and millions upon millions of users and you&rsquo;ve got a perfect petri dish.  Sit back and watch the show.  Unpredictable behavior will surely result.</p>
<p>In fact, &quot;emergent&quot; sometimes gets used as a synonym for &quot;unpredictable&quot;.  By and large, I believe that&rsquo;s true.  In traditional systems design, &quot;unpredictable&quot; definitely equals &quot;sloppy&quot;.  Command-and-control, baby.  Emergent behavior is what happens when your program goes off the rails.</p>
<p>The thing is, emergent behavior is where all the really interesting things happen.  Predictable programs are boring.  Big batch runs are predictable.</p>
<p>But, you have to consider the complete system.  In a big batch run, the system is linear: inputs, transformation, outputs.  No feedback.  No humans.  When you include humans in your view of the system, all these messy feedback loops start to appear.  It gets even worse when you have multiple humans connected via the programs.  Feedback loops that stretch from one person, through at least two programs, out to another person and back.</p>
<p>Any system that involves humans <strong>will</strong> exhibit emergent behaviors &ndash; and this is a very good thing.</p>
<p>Are &quot;designed&quot; behavior and &quot;emergent&quot; behavior inherently incompatible?  I don&rsquo;t think so.  I think it may be possible to <em>design for emergent behavior</em>.  I mean that certain designs will encourage some kinds of emergent behavior, whereas other designs encourage other kinds of emergent behavior.  We can study the behaviors produced by various systems and designs to build a compendium of factors that are likely to facilitate one class of behavior or another.</p>
<p>For example: In every corporation, I see large volumes of data stored and shared in two different formats.  The nature of the two systems encourages very different behaviors.</p>
<p>First we have relational databases.  These tend to be large, expensive systems.  As a result, they are centralized to one degree or another.  The nature of relational algebra is that of a static schema.  Therefore, changes are rigidly controlled.  Centralized, rigidly controlled assets require guardians (DBAs) and  gatekeepers (data modelers).  Because the schema is well-defined and changes slowly, the database gains a degree of transparency.  Applications are integrated through their databases.  Generic tools for backup, reporting, extraction, and modeling become possible.  The data can be accessed from a variety of applications in a relatively generic fashion.</p>
<p>The other data storage tool I see used widely is the spreadsheet.  I almost never see a spreadsheet used to calculate numbers.  Instead, most are used as a schema-less data storage tool.  Often created directly by the business analysts, these spreadsheets are very conducive to change.  Sharing is as simple as sending the file through email.  Of course, this leads to version conflicts and concurrent update issues that have to be settled by hand (usually by printing a timestamp on the hardcopies!)  There is not a central definition of the data structure.  Indeed, neither the data nor the structures from spreadsheets can be reused.  A spreadsheet makes the 2-dimensional structure of a table obvious, but it makes relationships difficult, if not impossible, to represent.  Ergo, spreadsheet users don&rsquo;t do relationships.  Access to the spreadsheets is always mediated by a single application.</p>
<p>So, two different systems.  Both store structured (or at least semi-structured) data.  The nature of each produces very different emergent behaviors.  In one case, we find the evolution of acolytes of the RDBMS.  In the other case, we find that a numeric analysis tool is being used for widespread data storage and sharing.</p>
<p>Given enough examples, enough time, and enough study, can we not learn to extrapolate from the essential nature of our designs to the most probable emergent behaviors?  Even perhaps, to select the emergent behaviors that we desire first, and, starting from those, decide what essential nature our designs must embody to  most likely to encourage those behaviors?</p>
]]></content></entry><entry><title>Names have Power</title><link href="https://michaelnygard.com/blog/2002/03/names-have-power/"/><id>https://michaelnygard.com/blog/2002/03/names-have-power/</id><published>2002-03-19T23:11:00-06:00</published><updated>2002-03-19T23:11:00-06:00</updated><content type="html"><![CDATA[<p>
Names have power.  Shamanic primitives guard their true names -- give me your name and you give me power over you.  In the ether, your name is your only identity.  Give away your name and you give away yourself.  No cause, issue, or crusade has a follower until it has a name.  A good name evokes images, emotions.  A well-named issue becomes uncontestable.  (Who is <em>really</em> opposed to &quot;family values&quot;, anyway?)
</p>

<p>
Naming things well may be one of the hardest jobs in design.  Somebody once said that object-oriented design was about creating the language that you would use to solve the problem. Start with the language (a collection of names, and rules about how to assemble the names), then deal with the  problem.
</p>

<p>
I'm struggling with naming something right now.  I can sense what it is.  There is a real thing there.  I can feel it.  I need to define it, give it boundaries.  When I can name it, I will give it life.
</p>

<blockquote>
Find the line, find the shape<br />
Through the grain<br />
Find the outline, things will<br />
Tell you their name<br />
--Susanne Vega</p>
</blockquote>

<p>
The best name I've come up with yet is <em>fluid</em>.  There are fluid methods, fluid tools, fluid technologies, fluid designs, and so on.  Things that are fluid welcome change.  They adapt.  They are pleasant to modify.  If I have a fluid architecture, then integrating a new system into the mix does not cause massive headaches and heartburn.  (Hmmm.  So dropping a new system into a fluid architecture doesn't cause a ripple effect?  Right.  See how hard it is to name things?)  Fluid &quot;stuff&quot; does not resist change.  Being fluid means nothing is ever carved in stone.  Things that are fluid encourage certian emergent properties that we value: fast, flexible, joyous.
</p>

<p>
Pah.  That's damn close to gibberish.
</p>

<p>
Let's try analogy and contrast:
</p>

<table width="75%" cellspacing="0" cellpadding="0" border="0">
  <tbody>
<tr><td><strong>Fluid</strong></td><td><strong>Not fluid</strong></td></tr>
<tr><td>Publish-subscribe messaging</td><td>Flat file integration</td></tr>
<tr><td>Typeless languages</td><td>Strongly-typed languages</td></tr>
<tr><td>Tuple-spaces</td><td>Relational databases</td></tr>
<tr><td>eXtreme Programming</td><td>SEI CMM Level 5</td></tr>
<tr><td>Cross-functional teams</td><td>Silos</td></tr>
<tr><td>Whiteboard task lists</td><td>GANTT charts</td></tr>
<tr><td>Web</td><td>Client-server</td></tr>
 <tr><td>20-person startup</td><td>The same company at 150 people</td></tr>
</tbody>
</table>  

<p>
Does that help?  The items on the left share some essential, underlying attributes.  The things on the right lack those attributes; they embody different values.  (I don't like the semiotics of &quot;fluid&quot;.  Call that a working title, not a true name.  Besides, the natural opposites of &quot;fluid&quot; would be &quot;solid&quot; or &quot;concrete&quot;.  These are both positively-connoted terms.)
</p>

<p>
So what can I name this quality?  Is there really something essential there, or is this just reflecting nothing more than the way I like to work?
</p> 
]]></content></entry><entry><title>Lately, I have been struggling</title><link href="https://michaelnygard.com/blog/2002/03/lately-i-have-been-struggling/"/><id>https://michaelnygard.com/blog/2002/03/lately-i-have-been-struggling/</id><published>2002-03-16T22:42:00-06:00</published><updated>2002-03-16T22:42:00-06:00</updated><content type="html"><![CDATA[<p>Lately, I have been struggling to find the meaning in my work.  I suppose that&rsquo;s not surprising.  I am a human being&ndash;a mortal creature.  My age will soon flip a decimal digit.  (I decline to specify which.)  These can certainly cause one to spend time reflecting on one&rsquo;s legacy.  They can also cause one to buy a flaming red sports car.  I  may explore that option later.</p>
<p>I also work in a field of incredible transience.  Two hundred years from now, no cathedral will bear my mark.  No train depot of my design will grace the National Register of Historic Places.  No literary critics will deconstruct the significance of my characters&rsquo; middle initials.  In truth, the shelf life of my work compares poorly to that of a gallon of milk.</p>
<p>I am a programmer.</p>
<p>I and my comrades can usually be found behind our glowing screens, working hour after hour to bring some other person&rsquo;s vision to life. We who grapple with chaos and ether and mud expend our spirit, energy, life, time, soul, and qi in the name of creation.  We work long after the managers have left.  We learn the janitors&rsquo; names.  I have often gazed out my window to the neon street below, full of the theater signs, restaurants, and wandering crowds seeking to be entertained.  I have wondered what kind of life I should have led to be in that crowd instead of watching it.  I&rsquo;ve wondered how I could rejoin that human mass.  I think I&rsquo;d have to change careers.</p>
<p>I cannot deny, however, that my work brings me deep&ndash;if ephemeral&ndash;satisfaction.  The harsh joy of self-sacrifice combines with the exultant delight of success when a project comes together. When I finally get my programs to work, it&rsquo;s a kind of magic, dense and layered.  At one level, the thought that my work will be useful to someone&ndash;that it will make dozens, hundreds, maybe millions of people more individually powerful&ndash;is heady and exciting.</p>
<p>At another level, I have a fierce pride that my software works at all. Knowing that my creation is strong enough, powerful enough to survive the threat of millions of users doing their damndest to destroy it. Despite the teeming millions trying to prove that there is no such thing as &quot;foolproof&quot;, my software keeps working.  &quot;Robust&quot;, we call it.  &quot;Resilient&quot;.  &quot;Come on&quot;, it says, &quot;bring it on.&quot;</p>
<p>Deeper still, I take a craftman&rsquo;s pride in a job well done.  Like a mason or a carpenter, I know what is under the surface.  I know how well it is put together.  I know what skill went into its construction.  No one else may see this, but I know.</p>
]]></content></entry></feed>