The Hundred-Million-Dollar File
A prediction about the strangest acquisition of the coming decade
Somewhere in the next few years, a transaction will close that will make no sense to anyone who reads about it in the traditional way.
There will be no product. No software. No team of engineers joining the acquirer. No users, no revenue, no roadmap. The entire deal, nine figures, perhaps more, will consist of a single file changing hands. A terabyte, maybe less. Small enough to fit on a drive you could carry in your pocket.
And it will be one of the most rational acquisitions of the decade.
The new scarcity
For the past several years, the AI industry has operated on a simple assumption: the internet is the training set. Scrape it, filter it, feed it to the model. But that well is running dry, and everyone building frontier models knows it. Public data has been consumed. Synthetic data helps, but it recycles what the models already know. The next leap in capability won’t come from more of the same data; it will come from data that has never been exposed on the open web.
That data exists. It just doesn’t exist online.
It lives in the accumulated judgment of a veteran commercial litigator who has handled 10,000 contract disputes. In the operational playbooks of a logistics operator who spent thirty years learning what actually breaks in a supply chain. In the diagnostic intuition of a specialist physician, the underwriting instincts of an insurance veteran, the unwritten rules of how deals actually get done in a specific industry, in a specific region, among specific people.
Economists call this tacit knowledge. Institutions call it experience. AI labs will soon call it the most valuable dataset money can buy.
The knowledge founders
Here is the prediction: a new category of builder is about to emerge, not a software founder, but a knowledge founder.
These will be small teams, sometimes just two or three people, who recognize that their real asset is not a product but a corpus. They will spend three to five years doing deliberate, unglamorous work: documenting decisions and the reasoning behind them, recording expert workflows end-to-end, capturing edge cases, failures, corrections, and the thousand small judgments that separate a competent professional from a great one. They will structure it, annotate it, and, critically, own it cleanly, with airtight provenance and consent.
From the outside, it will look like nothing. No app. No growth chart. No demo day. Just people quietly turning lived expertise into structured data, in domains where the internet has almost nothing to say.
Then one day, an AI lab desperate to make its models genuinely competent in law, medicine, insurance, manufacturing, or frontier-market commerce will come knocking. And the price will not be set by revenue multiples or user counts. It will be set by a simpler question: what is it worth to be the only lab with a model that can do this?
The answer, for the right corpus, will start with a one and end with eight zeros.
Why the deal will look absurd and won’t be
Skeptics will point at the transaction and laugh. A hundred million dollars for a file? For something you could email in pieces?
But we have seen this pattern before. WhatsApp was “just an app” at $19 billion. Instagram was “thirteen people” at a billion. In every era, the market takes years to price a new kind of asset correctly, and the first buyers who understand it look reckless right up until they look prescient.
The economics of a knowledge corpus are, if anything, cleaner than software. There is no churn. No maintenance burden. No competitive moat to defend because the moat is the asset. If the data captures expertise that took a generation to accumulate and exists nowhere else, it cannot be replicated by a competitor at any speed. Scarcity, in its purest form.
What this means now
If this future arrives and the incentives all point in its direction, the implications land today.
For professionals with deep, rare expertise: your knowledge is an asset class, and it is currently unrecorded. The window to capture it deliberately, before models learn to extract it some other way, is open now.
For institutions: the archives, playbooks, and case histories gathering dust in your systems may be worth more than your operating business. Provenance and data rights, long treated as legal housekeeping, are about to become the core of enterprise value.
And for the rest of us, a quieter shift: the most valuable companies of the next decade may not look like companies at all. They will look like a few people, a few years, and a file.
A terabyte, maybe less. Worth more than most startups will ever be.
Responses