
TL;DR
- The people getting the most AI value right now are the ones the firm has least visibility into. The partner prepping depositions with Claude and the associate who vibe-coded a contract script aren’t filing technology requests. Levels 1–3 are where real adoption happens, largely ungoverned.
- AI use spans five levels — mismatching level to task wastes money in both directions. A six-month vendor evaluation for a task an associate could solve in an afternoon is over-engineering. A fragile undocumented script becoming the firm’s de facto contract pipeline is under-engineering. Both happen constantly.
- Agent harnesses are the 2026 dividing line. Level 2 is no longer generic workflow automation. It’s Claude Cowork, Microsoft Copilot, ChatGPT Enterprise with connectors and tools, and similar agentic workspaces that run reusable skills on a matter’s files.
- Vibe-coded apps deliver fast early wins and fragile results. AI-generated code has 1.7x more major issues than human-written code. A script that works on 50 standard contracts will silently fail when contract 12 of the next deal uses a cross-reference instead of a dollar figure.
- Knowledge management is the bridge from individual prompting to agentic workflows. Connecting a firm’s playbooks and clause libraries to AI tools is what moves teams from Level 1 to Level 2 — Freshfields increased AI usage 500% in six weeks by doing exactly this.
- When someone says “AI-native,” ask at which levels they’re actually operating. Crosby, Lawhive, and Garfield AI represent three structurally different answers — and none of them compete for the same work BigLaw handles.
Corrections & Updates
- June 22, 2026: Reframed Level 2 from “Workflow Automation” to “Agent Harnesses” to reflect agentic developments since publication — tools like Claude Cowork now run reusable skills directly on a matter’s files, which subsumes the no-code automation this level originally described. Reframed Level 3 as “Vibe-Coded Apps” to distinguish user-built software from agentic workspaces.
A litigation partner at a midsize firm uses Claude to prep for depositions: transcripts in, summaries out, follow-up questions as needed. The work happens outside any formal firm initiative or approved vendor process.
At the same firm, the innovation team is spending $400,000 to evaluate enterprise AI platforms for contract review. Six months in, nothing has deployed.
Neither approach is wrong, but they require completely different frameworks to evaluate — and most firms treat them as the same conversation. The 2026 shift is that the middle of the spectrum has split: agent harnesses run reusable skills inside existing workspaces; vibe-coded apps create new software outside them. Understanding where a given use case sits determines everything: what to buy, what to build, what risks to manage, and what to ignore. The point isn’t to force every use into an approved platform. It’s to build governance that starts from reality rather than from an org chart.
The Five Levels#
Level 1: Personal Enhancement#
A lawyer opens ChatGPT, Claude, or Gemini and asks it to do something. Summarize this deposition excerpt. Rewrite this email to sound less aggressive. Explain what “anti-dilution ratchet” means in plain English. Draft a first pass of interrogatories.
This is the most common form of AI use in law today, and it’s almost entirely invisible to firm management. Thomson Reuters’ 2025 Future of Professionals Report found that individual professionals were adopting AI tools faster than their organizations. The gap hasn’t closed.
At this level, the AI is a personal productivity multiplier. The lawyer provides the input, evaluates the output, and decides what to use. There’s no system integration, no retrieval pipeline, no firm data involved — just a human and a chat window. The cost is $0-20/month for a consumer subscription.
The trade-off is real but manageable: output quality depends entirely on the individual’s prompting skill, there’s no institutional knowledge in the loop, and the firm has no visibility into what’s being processed. If the partner pasting deposition transcripts into a consumer chat product hasn’t read the provider’s data retention terms, that’s a privilege and confidentiality question the firm doesn’t even know to ask. ABA Formal Opinion 512 (July 2024) requires lawyers using AI to understand how the technology handles confidential information — a requirement that’s hard to satisfy when the firm doesn’t know the technology is being used.
Level 2: Agent Harnesses#
A lawyer works inside an agentic harness — Claude Cowork, Microsoft Copilot, ChatGPT Enterprise with connectors and tools, or a similar AI workspace — that goes beyond answering questions to acting on the firm’s actual files. Point it at a matter folder and it reads, drafts, and organizes across the documents; give it a reusable skill — packaged instructions for a recurring task like classifying intake emails, first-passing NDAs against the playbook, or coding invoices to the right matter — and it runs that task the same way every time.
The leap from Level 1 isn’t sophistication — it’s agency plus repeatability. The system can inspect a workspace, use tools, act across files, and rerun a defined skill without the lawyer rebuilding the prompt from scratch each time. Whoever built the skill wrote and tested the instructions once; after that the harness applies them consistently, with no custom application and no user-written code. A skill is a reusable workflow.
This is where AI starts saving measurable time — and where errors start compounding. A bad output on one document is a mistake; a flawed skill running across a matter folder, or a classifier running on 200 emails a week for three months, is a systemic failure no one catches until something breaks. Level 1 has a human reviewing every output; Level 2 often doesn’t. That gap creates a supervision problem: Model Rule 5.1 holds supervisory lawyers responsible for work done under their authority, and an agent acting across client files is work done under someone’s authority even when no one has assigned that responsibility explicitly.
Level 3: Vibe-Coded Apps#
An associate who knows a little Python — or more likely, knows how to describe what she wants to Claude Code, Cursor, or Replit — builds a small application for a specific problem. A script that extracts indemnification caps from a stack of 50 purchase agreements. A dashboard that tracks opposing counsel’s motion practice across three related cases. A tool that compares two contract versions and produces a redline summary.
This is vibe coding: describing the desired software in natural language and letting an AI generate it. The term, coined by Andrej Karpathy in early 2025, captures a real shift. Building functional software no longer requires knowing how to write it. A quarter of Y Combinator’s Winter 2025 startups had codebases written almost entirely by AI.
The boundary with Level 2 matters. In Level 2, the lawyer is using an agent harness and a reusable skill inside an existing AI environment. In Level 3, the lawyer has created software: a script, dashboard, database, web app, or command-line tool that persists outside the chat window. Instead of waiting six months for the innovation team to evaluate vendors, an associate builds what she needs in an afternoon. The tool does exactly what her workflow requires. It costs nothing beyond the AI subscription she already has.
It also has no tests, no error handling, no documentation, and no one who can fix it when it breaks. A grey literature review of practitioner accounts found a consistent pattern: vibe coders experience rapid early success, then hit a wall when the generated code encounters inputs it wasn’t built for. Analysis of AI-generated pull requests found 1.7x more major issues than human-written code.
In a legal context, those failure modes aren’t abstract. An associate builds a script to extract indemnification caps from 50 purchase agreements for a deal. It works — every contract in the set uses a “not to exceed $[amount]” pattern the model handles cleanly. Six months later, another associate reuses the script on a different deal. Contract 12 in the new set caps indemnification through a cross-reference to a defined term on a different page. The script doesn’t follow the reference. It reports “no cap identified.” The deal team relies on that output, and no one catches the $5 million error until the client asks why the indemnification analysis is missing a key term. Nobody remembers how the script works. Nobody can audit what it did.
There’s a deeper problem beyond fragile code. LLM-powered tools are nondeterministic — run the same document twice, get slightly different output. A traditionally engineered parser runs the same way every time, on commodity hardware, for fractions of a cent. An LLM-powered parser sends the full document to an API, pays per token, and produces results that vary between runs. For classification, that’s usually harmless. For extracting a dollar figure a deal team will rely on, it’s a problem. Law runs on consistency, and a tool that produces slightly different extractions each time is solving one problem while creating another.
The ethical dimension sharpens at this level. Model Rule 1.1 (competence) requires lawyers to understand the tools they use. An associate who deploys a vibe-coded tool she can’t debug, on client data she can’t trace, is relying on a system she doesn’t understand to produce work product she’s responsible for.
Level 4: Internal Applications#
The firm’s technology team takes a Level 3 concept and turns it into something durable. They build a contract analysis application with proper authentication, error handling, logging, and a user interface that doesn’t require a command line. It connects to the firm’s document management system, uses the firm’s playbook as a retrieval source, and routes outputs to the right practice group.
This is software development — not vibe coding, not prompt engineering, but actual engineering. Platforms like Harvey and DeepJudge sell infrastructure for building these applications: retrieval pipelines, agent frameworks, and compliance tooling that sit between the foundation model and the firm’s data.
Level 4 requires dedicated engineering resources. A firm building internal applications needs at least one developer who understands LLM architecture, plus ongoing maintenance as underlying models change. (When Anthropic ships a new Claude version, prompts that worked on the old version may not work on the new one.) The payoff is a tool tailored to the firm’s specific document types, workflows, and quality standards — something no off-the-shelf product replicates exactly.
The build-vs.-buy calculus: if the task is high-volume, narrow, and poorly served by existing vendors, building makes sense. Replicating a well-served commercial product usually means spending engineering salary to save on license fees.
Level 5: Enterprise Platform#
The firm deploys a commercial legal AI platform across practice groups: CoCounsel for research, Kira for due diligence, Everlaw for e-discovery, Spellbook for contract review. These are full products with managed infrastructure, compliance certifications, vendor support, training programs, and integration with the firm’s existing systems.
At Level 5, the firm is a buyer, not a builder. The value proposition is everything the firm doesn’t have to do: prompt engineering, model evaluation, retrieval pipeline design, security audits, ongoing testing. The vendor has already solved these problems and amortized the cost across hundreds of customers. That’s the 60-200x markup from raw model cost to product price — and for most firms, it’s worth it.
The risk at Level 5 is vendor dependency. If a contract review workflow runs on a single vendor’s platform and that vendor changes its model, reprices its API, or gets acquired, the workflow changes with it. Enterprise buyers should be asking: What foundation model does this run on? Where are client documents processed? What happens when the model updates? These platforms also face growing pressure from below: as Levels 3 and 4 make it easier for firms to build narrow tools in-house, vendors that can’t justify their markup over raw model costs — the SaaSpocalypse that erased roughly $2 trillion from software valuations in early 2026 — will lose to internal builds and AI-native competitors.
The Spectrum in Practice#
Most firms don’t sit neatly at one level. They operate across several simultaneously, often without realizing it.
The litigation partner prepping for depositions at Level 1 is not waiting for the firm to finish a Level 5 vendor evaluation. The associate building extraction scripts at Level 3 is not waiting either. These uses coexist, and the firm is better off acknowledging them than treating sanctioned tools as the whole adoption picture.
The practical question for any legal team is: which level does this task belong at?
A one-off deposition summary? Level 1. A reusable skill that classifies intake emails in an agentic workspace? Level 2. A vibe-coded dashboard that extracts key terms from 50 contracts for a single deal? Level 3. Standardizing contract analysis across the corporate practice group? Level 4 or 5, depending on whether the firm has the engineering talent to build or should buy.
Mismatching the level to the task wastes money and time in both directions. Running a six-month vendor evaluation for a task an associate could solve in an afternoon with a chat window is over-engineering. Letting an associate’s fragile, undocumented script become the firm’s de facto contract analysis pipeline is under-engineering. Both happen constantly.
The Knowledge Management Bridge#
The most common firm-level strategy right now is using knowledge management to push lawyers from Level 1 (individual chat windows) toward Level 2 (agent harnesses with institutional knowledge in the loop). The idea: capture the firm’s playbooks, clause libraries, and practice-group standards in a structured way, then wire that knowledge into reusable skills that produce consistent outputs instead of one-off answers that vary by whoever wrote the prompt.
This isn’t new. Law firms have been trying to systematize knowledge management for 10-15 years — Harvard’s Center on the Legal Profession notes that about a third of firms have some form of practice methodologies in place, often under the banner of legal project management. The results have been marginal. Lawyers don’t fill out knowledge management systems for the same reason they don’t fill out time sheets promptly: the benefit is collective, the cost is individual, and the deadline is always something else.
AI changes the value proposition. A clause library sitting in a SharePoint folder is inert — useful only if someone remembers it exists and searches for it. The same clause library connected to an AI workflow is active: the system pulls the firm’s preferred indemnification language when an associate asks it to review a contract, flags deviations from the playbook, and suggests fallback positions from the firm’s own negotiation history. The knowledge management layer becomes the difference between a generic LLM output and one grounded in how this firm actually practices.
In practice, this looks like practice groups building what amount to prompt-and-retrieval packages: a contract playbook (preferred positions, fallback language, deal-breaker terms), a clause bank, a set of templates, and a curated prompt library — all feeding into an AI tool like Harvey, Spellbook, or even a firm-specific RAG pipeline. Freshfields recently announced a multi-year collaboration with Anthropic to build exactly this: firm-wide AI workflows connected to the firm’s institutional knowledge, deployed across all 33 offices and every practice group. Within six weeks, usage increased 500%.
The approach works best when firms treat it as a transition strategy rather than a destination. A practice group that builds a contract review playbook and connects it to an AI tool has moved from Level 1 (individual associates prompting from scratch) to Level 2 (an agent harness running that playbook as a reusable skill). If that playbook gets built into software — first a vibe-coded app, then a proper application with error handling and quality control — it reaches Level 3 or 4. The knowledge management layer is the bridge — it’s what turns ad hoc AI use into something the firm can govern, improve, and scale.
The hard part isn’t the technology. It’s the same problem knowledge management has always had: getting lawyers to contribute. The firms seeing results are the ones that build capture into the workflow itself — when a lawyer corrects an AI’s contract markup, the correction updates the playbook automatically, so the next review starts from a better baseline. The knowledge management system improves as a byproduct of doing the work, rather than requiring a separate act of documentation.
AI-Native Law Firms#
“AI-native” has become one of those phrases that means whatever the speaker needs it to mean. A lawyer with a ChatGPT subscription, a firm with Harvey licenses, and a fully autonomous debt-recovery service can all claim the label. The spectrum makes the question more precise: at which levels are they actually operating?
A firm using AI at Level 1 across the board isn’t AI-native in any meaningful sense. It’s AI-available. The firms that warrant the label share three characteristics: AI handles the default workflow, humans intervene by exception, and pricing is fixed or outcome-based rather than hourly. By that definition, very few firms qualify — but the ones that do show what the spectrum looks like without legacy infrastructure in the way.
Crosby is the Level 4 version: custom software plus lawyers for contract review, with AI agents doing intake, analysis, and drafting while lawyers handle judgment calls. Lawhive is closer to Level 5: an AI operating system plus a lawyer network for consumer law, with fixed fees and far higher case volume. Garfield AI is the narrow autonomous version: small business debt recovery under £10,000, solicitor oversight, and client approval before each step.
The pattern is the same across all three. AI-native firms target high-volume, standardizable work where traditional overhead creates a price gap wide enough to build a business in. They aren’t competing for complex M&A or bet-the-company litigation. They’re competing for the work that partners already consider low-margin and associates consider tedious — a large share of the legal market by volume, even if it’s a smaller share by revenue.
The Uncomfortable Part#
The AI use spectrum isn’t a maturity model. Level 5 isn’t better than Level 1. But the framework exposes something most firms haven’t reckoned with: the people getting the most value from AI right now are the ones the firm has the least visibility into.
The lawyer using Claude for deposition prep is not filing a technology request. The associate who vibe-coded a contract extraction script is not submitting it to IT for review. These are real uses of AI on client work, and the firm’s governance framework — if it has one — often does not account for them. The formal AI strategy, the six-month vendor evaluation, the approved tool list — all of that addresses Levels 4 and 5. Levels 1 through 3 are where much of the adoption is happening, largely ungoverned.
That’s not a problem to solve with a policy memo. It’s a problem to solve by being honest about what’s already happening, and building governance that starts from reality rather than from an org chart. The firms that get this right won’t be the ones with the most sophisticated AI platform. They’ll be the ones that figured out which level each task actually belongs at.
Further Reading#
- 10 AI Law Firms to Watch in 2026. Lupl’s survey of AI-native legal service providers.
- SRA Approves First AI-Driven Law Firm. The UK Solicitors Regulation Authority’s announcement on Garfield AI.
- UK’s SRA Takes Unprecedented Approach in Authorising AI-Enabled Firms. International Bar Association analysis of the regulatory trend.
- Lawhive Raises $60 Million in Series B Funding. Fortune’s profile of the AI-native consumer law firm.
- AI for Legal Knowledge Management: Build a Precedent + Prompt System. Clio’s practical guide to building KM-powered AI workflows.
- The Impact of AI on Law Firms’ Business Models. Harvard Law’s Center on the Legal Profession on practice methodologies and AI.
- Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook. Grey literature review of 518 practitioner accounts on vibe coding trade-offs.
- ABA Formal Opinion 512. The ABA’s 2024 guidance on lawyers’ duties when using AI.
This post is part of the AI Adoption Strategy series on LegalRealist AI. It is intended for informational and educational purposes only and does not constitute legal advice. AI capabilities, pricing, and product features described here reflect publicly available information as of the publication date and are subject to rapid change. Laws governing AI use vary by jurisdiction.



