Continuing my theme of filthy data.

A few years ago, there was a lot of excitement around clickstream analysis. This was the idea that, by watching a user's clicks around a website, you could predict things about that user.

What a backwards idea.

For any given user, you can imagine an huge number of plausible explanations for any given browsing session. You'll never enumerate all the use cases that motivate someone to spend ten minutes on seven pages of your web site.

No, the user doesn't tell us much about himself by his pattern of clicks.

But the aggregate of all the users' clicks... that tells us a lot! Not about the users, but about how the users perceive our site. It tells us about ourselves!

A commerce company may consider two products to be related for any number of reasons. Deliberate cross-selling, functional alignment, interchangability, whatever. Any such relationships we create between products in the catalog only reflect how we view our own catalog. Flip that around, though, and look at products that the users view as related. Every day, in every session, users are telling us that products have some relationship to each other.

Hmm. But, then, what about those times when I buy something for myself and something for my kids during the same session? Or when I got that prank gift for my brother?

Once you aggregate all that dirty data, weak connections like the prank gift will just be part of the background noise. The connections that stand out from the noise are the real ones, the only ones that ultimately matter.

This is an inversion of the clickstream. It tells us nearly nothing about the clicker. Instead, it illuminates the clickee.