The Perils of Semantic Coupling - Wide Awake Developers

On the subject of maneuverability, many organizations run into trouble when they try to enter new lines of business, create a partnership, or merge with another company. Updating enterprise systems becomes a large cost factor in these business initiatives, sometimes large enough to outweigh the benefits case. This is a terrible irony: our automation provides efficiency, but removes flexibility.

If you break down the cost of such changes, you'll find it comes in equal parts from changes to individual systesm and changes to integrations across systems. Integrations are always costly and full of risk, and never more so than we changing cardinalities. Partnerships and mergers pretty much always change cardinalities, too.

The cost factor arises from "semantic coupling." That is the coupling between services introduced because the services need to share concepts. It usually appears as data types or entity names that pop up in many services.

As an example, let's think about a tiny retailing system with a small set of what I'll call "macroservices". One of the most important entity types here is the Stock Keeping Unit, or SKU. It represents "a thing which can be sold". In a typical retail system, it has a large number of attributes that describe how the item is priced, delivered, displayed on the web, upsold and cross-sold, reviewed, categorized, and taxed.

SKUs are created in a master data management system. There may be a variety of feeds that get massaged into MDM, but we'll consider that to be outside the boundary of our interest for now. From MDM, the SKU must be distributed to a number of other services:

Each of these macroservices uses aspects of the SKU for its own purpose. Content management attaches "telling and selling" content to the SKU so it can be presented nicely on the web. Pricing adds it to the pricing rules. Shipping identifies the carriers, options, and costs to deliver it. Order management–probably a great big silver beast of a system–tracks inventory, orders, delivery rules, returns, and a lot more.

Now what happens if we have to make a major change to the SKU? Let's imagine that we want to change how we manage prices. In the past, merchants set prices on each item individually. Now, we've got too much in the catalog for that to scale so we introduce the idea of price points for digital items. A price point is a price that applies to a large number of SKUs. When we change the price point, all SKUs that refer to it should be changed at the same time. So, if we decide to reduce the price of a low-bitrate MP3 track from $0.99 to $0.89, we can just change a single price point record.

How many systems do we have to change for this new concept?

If we consider "price point" to be part of our core domain, then we have to add that concept everywhere. The surface area of that change is really large, and it will be a costly change to make. It might even be too costly to be worth doing. We could hire a small army of temp workers to update price records by hand twice a year and still come out ahead. That's not a very satisfying answer though. All this automation is supposed to make us more efficient! What good is it if we are stuck with outdated processes because our systems are too hard to change?

The key problem is semantic coupling. There are a lot of systems here that shouldn't need to care about the "price point" concept. It has no bearing on the digital locker, shipping, or ratings & reviews.

In this example, we can reduce the semantic coupling. Simply decide that "price point" is not a core concept. It is a detail of data management for the MDM system. Everything downstream receives SKUs with a list price. No downstream system should care how that list price was determined.

This decision flattens a many-to-one relationship from SKU to price point. In so doing, we get a huge benefit. We eliminate an entire entity and all references to it from all the downstream systems.

I would even make a case for shattering the concept of SKU into multiple separate concepts. MDM may keep that concept. Downstream, though, each system has its own set of internal concepts. We should treat identifiers from other systems as opaque tokens that we map onto our own system's space.

For example, the pricing service doesn't need to know that it is pricing SKUs. It just needs to price "things that can be priced." I know, it sounds tautological, but I think we get misled as humans… we think of SKU as a unitary concept so we build it as such in our systems. But look what happens if we say a pricing service can price "stuff and things" as long as they have some mapping in the pricing service itself. We can add an entirely new universe of things to price, without forcing everything on Earth to be a SKU!

We should scrutinize each of the other services, asking ourselves, "Does this really care about a SKU? Or does it care about something that a SKU happens to posess?" I would argue that in each case, the service really cares about "Thing that can be Xed". Priced, taxed, shipped, reviewed, etc. Are SKUs the only things that can be taxed? Are they the only things that can be reviewed? Etc.

Iterate this process and four things will happen:

Your services will shrink.
Your services will become much more general.
Each service will own its own space of identifiers.
Your organization will become more maneuverable.

The key point I want to make here is that a concept may appear to be atomic just because we have a single word to cover it. Look hard enough and you will find seams where you can fracture that concept. Don't share the whole thing. Don't couple all your downstream systems to the whole concept, and definitely don't couple your downstream to a complex of related concepts! It's a cardinal sin.