Scaling Institutional Intelligence
Most professional services firms that "adopted AI" gave every employee a very expensive search engine. The engineering that makes AI compound in value, the boring part nobody wants to fund, is still largely optional. For now.
The Productivity Gap Nobody Is Measuring
Picture two associates at two different firms. Same task: draft a client memo for a regulatory update. Both open an AI tool, ChatGPT, Copilot, Claude, Gemini, take your pick. Same category of technology, probably even the same underlying model.
One associate types four words into a purpose-built workflow and gets a client-ready draft in 30 seconds. The system knows their firm's voice, knows the client's industry context, knows the preferred memo structure, and has read the three best examples of this exact document type from the past 18 months.
The other associate is on their fourth attempt. They pasted in a previous example. They added "no, more formal" like they were texting with an intern who has never met the client. Twenty-five minutes later, the output is 70% usable. They rewrite the rest themselves.
The difference is not the AI, and it is not the prompt. It is the engineering underneath: the knowledge infrastructure that one firm built and the other skipped. That infrastructure is called Knowledge Engineering, and it is why most firms are leaving the majority of their AI investment sitting unrealized in a shared drive full of prompt templates nobody updates.
One thing worth naming before going further: this is not an argument that your people will be replaced. The firms that get this right do not end up with fewer consultants. They end up with consultants who spend less time doing work that should be running itself, and more time doing the work clients actually pay premium rates for. That is where the margin comes from.
Where the Budget Actually Goes
When AI lacks context about your firm, every interaction becomes a negotiation. Users iterate, rephrase, and eventually just write the thing themselves, burning through two budgets simultaneously: API usage and billable time.
When AI systems lack proper context, every interaction spawns clarification rounds, retry loops, and bloated prompts compensating for missing knowledge. Each one burns tokens. None of them produce the output you needed.
API costs are a rounding error compared to what gets lost in the perpetual argument with the AI. Senior people iterating, rephrasing, and eventually just writing the thing themselves. That time is billable. It is going somewhere else.
The licenses were purchased. The all-hands announcement was made. Everyone feels like something happened. But the associates are still copy-pasting from last quarter's deliverables, because the AI has never seen a good example of what the firm actually produces. It is generating plausible text, not institutional knowledge.
The Engineering That Makes the Difference
The market is full of firms selling "prompt libraries" and "AI playbooks." These are the meal kits of enterprise AI: feel productive, look organized in a shared drive, stop working the moment context gets complex. Every firm's context is complex.
Knowledge Engineering is the discipline of encoding your firm's actual business logic, relationships, standards, and accumulated judgment into a system that any AI model can use reliably at scale. Not a template frozen in amber. A living architecture that improves as the firm improves. Four techniques make the difference.
1. Contextual "Goldilocks" Engineering
Every AI instruction set has a sweet spot. Too rigid, and the system breaks on anything outside the template. Too loose, and output drifts into generic filler that could have come from any firm that afternoon. Most firms land on one extreme: the cautious ones write 2,000-word system prompts that become brittle on the first edge case; the other camp types "be professional and concise" and wonders why the output sounds like a 2019 press release. Goldilocks engineering calibrates instructions to the right specificity per task, tested and measured rather than guessed at. Skip it, and the technical term for what follows is prompt drift: a coherent workflow slowly decomposing into a collection of individual workarounds that nobody owns.
2. Semantic Layering, Not Just Vector Search
Most implementations use vector search: throw documents into a database, find "similar" text, feed it to the model. It works, until it doesn't. Vector search has no concept of meaning. It does not know that a "Client" has "Matters," that Matters produce "Deliverables," or that a litigation deliverable looks nothing like a regulatory filing. That is how an AI confidently pulls a real estate template into a securities compliance memo, and nobody notices until the partner is reviewing it Friday evening. Semantic layering builds an actual ontology of the firm's business logic. When the AI retrieves context, it navigates structured relationships, not just proximity in embedding space.
3. Automated Evaluation (LLM-as-a-Judge)
Here is a question most firms cannot answer: is your AI still performing as well as it was six months ago? Not "does it feel like it," but actually? Most deployed systems drift quietly as underlying models update and retrieval indexes shift. Nobody notices until a partner flags something obviously wrong, by which point the output is already in front of a client. Automated evaluation runs AI-generated outputs through a scoring pipeline continuously. A secondary model grades each output against defined criteria, and when scores drop below threshold the system flags it before a human catches the degradation. It is quality control that never misses a shift.
4. Few-Shot Retrieval Pipelines
Static templates are the duct tape of AI implementation. Someone edits the template. Someone else edits it back. A third person creates "v2" in a subfolder. Within months the system produces inconsistent output and everyone assumes the AI is getting worse. A few-shot retrieval pipeline does something smarter: instead of a fixed template, it dynamically retrieves the best past examples of whatever the user is producing, feeds them as context, and the output inherits the firm's current voice and standard automatically. As the firm produces better work, the pipeline draws on better inputs. The system compounds because the firm compounds. No prompt library does that, because prompt libraries are frozen the moment they are written.
This Problem Is Older Than AI. The Stakes Just Changed.
Knowledge Engineering is not a discipline that emerged with large language models. The structural fragility of firms that store expertise only in people's heads has been demonstrable for decades. What AI does is make the consequences faster and harder to ignore.
Dewey & LeBoeuf (2012)
1,300+ lawyers. 26 offices. Filed Chapter 11 with $560M in liabilities. The largest law firm collapse in U.S. history. No centralized knowledge infrastructure. When partners left, their expertise and relationships left with them.
McKinsey & Company (1987 to present)
Knowledge infrastructure investment since 1987. $600M+ invested annually. When Lilli, their AI platform, launched in 2023, it drew on 100,000+ curated documents accumulated over decades and reached 72% firm adoption within months.
Dewey's collapse in 2012 is often explained by overleveraged compensation guarantees and a botched merger. Those are the proximate causes. The structural one: institutional knowledge lived entirely in individual partners' heads. When financial pressure hit, partners left in clusters, each wave reducing the firm's remaining value. 17 departures in January. 125 in May. Every exit made the next more likely, because nothing was left behind. No system. No encoded expertise. Just empty offices and $225 million in bank debt.
McKinsey, which has been building the opposite kind of firm since 1987, launched Lilli to 7,000 employees in 2023. Within two months it had answered 50,000 questions. Reported time savings on knowledge tasks: up to 30%. That is not a technology result. Lilli performed well because the foundation was already mature. An AI model dropped into a firm with no knowledge infrastructure produces generically plausible output. The same model in a firm with decades of structured institutional knowledge produces something that sounds like the firm.
The principle holds at any size. A 30-person consulting firm whose institutional knowledge lives in its partners' heads is structurally fragile in exactly the same way Dewey was, just at a smaller scale and with less runway. AI in the agentic era amplifies whatever knowledge infrastructure already exists. Good infrastructure compounds. No infrastructure produces expensive mediocrity at scale.
The AI Works. The Question Is Whether You Built Something Around It.
The firms compounding returns from AI are doing work that is invisible from the outside: mapping domain knowledge into structured ontologies, calibrating instruction sets per workflow, building evaluation pipelines, and engineering retrieval systems that improve as the firm improves. None of that makes for a conference keynote. It is plumbing. It determines everything.
Gartner estimates poor data quality costs organizations an average of $12.9 million per year across all sizes, so scale that down for a 50-person firm and it is still a number worth paying attention to. Strong knowledge management systems reduce information retrieval time by up to 35% and boost productivity by 20 to 25%. The gap between those two outcomes is an engineering choice, not a software one.
Professional services firms sell expertise. The ones that encode that expertise into systems that scale will be structurally different: more resilient to turnover, more consistent in output quality, and better positioned to maintain fees as AI commoditizes the work that used to justify them. The others will keep wrestling with the tool 25 minutes at a time, muttering something about AI not being quite there yet.
The natural question is whether a firm should just build this internally. Some do. The more common pattern is six months and a real budget spent discovering that ontology design, retrieval calibration, and continuous evaluation are harder than they looked from the outside, at which point the outside help conversation happens anyway, just later and more expensively.
It is there. The question is whether the engineering is.
30 minutes. No pitch. Just an honest look at the gap.
We build the knowledge infrastructure that makes AI systems actually perform. Not prompt libraries. Not chatbot wrappers. Engineered systems that improve on their own.
Start a Conversation