Definitions of AI and legal technology terms used throughout LegalRealist. Terms are linked from posts at first mention. The glossary updates automatically as new posts introduce new concepts.
A
Agentic AI
(AI agents, agentic)
AI systems that go beyond single-prompt responses to autonomously plan, execute, and iterate on multi-step tasks — researching across sources, drafting documents, running analyses, and refining outputs without requiring human intervention at each step. Distinguished from basic chatbot interactions by the ability to use tools, manage context across steps, and self-correct.
A set of statistical and machine learning techniques used to identify data points, events, or patterns that deviate significantly from expected behavior. In government investigations and compliance, anomaly detection can flag suspicious transactions, unusual billing patterns, or outlier behaviors across large datasets — finding the needles that human reviewers would miss.
The custom software a vendor builds on top of a foundation model — including prompts, retrieval pipelines, fine-tuning, user interface, and workflow logic. Most “proprietary AI” in legal tech is application-layer work; the foundation model itself is licensed from OpenAI, Anthropic, Google, or another lab.
A standardized test designed to compare language model performance on specific tasks. Legal benchmarks like LegalBench measure issue-spotting and rule-recall; needle-in-a-haystack tests measure retrieval under increasing context length. Useful for rough comparisons, but benchmark scores can be gamed and don’t always predict real-world performance — the top models on public leaderboards often fall within statistical noise of each other.
A technique — used either through explicit prompting or built into the model architecture — where the language model works through a problem step-by-step rather than jumping to a final answer. OpenAI’s o-series models use internal chain-of-thought reasoning that improves complex analytical tasks but adds to output token cost. In legal applications, chain-of-thought prompting can improve contract analysis and issue-spotting by forcing the model to articulate its reasoning.
The process of splitting documents into smaller segments (chunks) for storage in a vector database. When a user queries the system, the most relevant chunks are retrieved and fed to the model as context. Chunk boundaries matter enormously: too small and the model loses surrounding context; too large and irrelevant content dilutes the signal. In legal documents, poor chunking can sever a clause from its definitions section or a finding from its evidentiary basis.
The phenomenon where language models perform measurably worse as the amount of input text grows, even when the task itself doesn’t get harder. A model that correctly identifies a clause in a 10-page contract may miss the same clause in a 200-page filing — not because the task changed, but because more context dilutes the model’s attention. Directly relevant to evaluating vendor claims about large context windows.
The maximum number of tokens a model can process in a single prompt — its working memory. A 200K-token window holds a lengthy contract and exhibits; 1M–2M token windows can ingest entire deal rooms. If a document exceeds the window, the model can’t reference earlier content when analyzing later content.
Numerical vector representations of text produced by a neural network, where semantically similar texts are mapped to nearby points in a high-dimensional space. Embeddings power semantic search in RAG systems: instead of matching keywords, the system finds text whose meaning is closest to the query. The quality of embeddings determines whether a retrieval system finds the right passages or misses them.
The process of further training a pre-trained foundation model on a specific dataset to specialize it for a task or domain. Distinct from prompt engineering (which changes inputs) and retrieval (which changes what context the model sees). Most legal AI tools rely more heavily on retrieval than fine-tuning because legal content changes faster than fine-tuning cycles.
A large, general-purpose model trained on broad data that can be adapted to many downstream tasks through fine-tuning, prompting, or retrieval. Examples: OpenAI’s GPT family, Anthropic’s Claude family, Google’s Gemini family, Meta’s Llama family.
A company building the most capable foundation models. As of 2026, the major frontier labs are OpenAI, Anthropic, Google DeepMind, Meta, xAI, and DeepSeek. Training a frontier model costs $100M+ and requires thousands of specialized processors.
The practice of anchoring a language model’s outputs to verified, retrievable sources rather than allowing it to generate from training data alone. Grounding techniques include retrieval-augmented generation, citation verification, knowledge graphs, and constrained generation. Since hallucination cannot be eliminated at the model level, grounding is the primary mechanism serious legal AI tools use to make outputs trustworthy.
Technical constraints built into an AI system that limit what the model can generate. LegalOn constrains outputs to attorney-written playbooks; Everlaw constrains to case documents. Guardrails reduce hallucination by narrowing the model’s output space — if the model can only select from pre-approved language or cite from a verified corpus, it has fewer opportunities to fabricate. A distinct mitigation strategy from retrieval, though often used together.
When a language model generates content that is fluent and plausible but factually wrong — citing a nonexistent case, misstating a holding, fabricating a statute. Not a bug; a structural feature of probabilistic text generation. Can be reduced through retrieval and verification but not eliminated.
The process of feeding input to a trained language model and receiving generated output — distinct from training, which builds the model in the first place. Every API call to Claude, GPT, or Gemini is an inference request. Inference cost (measured in dollars per million tokens) and inference speed (latency) are the two factors that determine whether an AI tool is economically viable and responsive enough for interactive legal work.
A structured database that represents entities (people, organizations, cases, statutes, concepts) as nodes and their relationships as edges, enabling queries that traverse connections rather than just match keywords. In legal AI, knowledge graphs power citation networks (like Shepard’s and KeyCite), connect related cases and statutes, and help retrieval systems understand that a party in one filing is the same entity referenced differently in another.
An AI system trained on large volumes of text to predict and generate language. Modern LLMs are built on transformer architectures and trained on hundreds of billions to trillions of words. The technology underneath nearly every legal AI tool.
A well-documented bias in transformer-based language models where information placed in the middle of a long input receives less attention than information at the beginning or end. Research shows a U-shaped attention curve: models are most accurate when relevant information appears in the first or last positions. For lawyers submitting large document sets to AI tools, this means document ordering can affect output quality.
A set of techniques where algorithms learn patterns from data — identifying relationships, classifying inputs, and making predictions — rather than following explicit rules written by a programmer. Supervised learning trains on labeled examples (e.g., returns flagged as fraudulent); unsupervised learning finds patterns in unlabeled data (e.g., clustering billing anomalies). In government enforcement, machine learning powers fraud scoring, anomaly detection, and audit selection across millions of records simultaneously.
An open protocol, introduced by Anthropic in late 2024, that standardizes how AI applications connect to external data sources, APIs, and tools. MCP servers expose capabilities — database queries, API calls, file access — that AI models can invoke during a conversation without custom integration code. In legal tech, MCP enables AI assistants to query case law databases, court filing systems, and document repositories directly rather than relying solely on pre-loaded context.
Model Routing
A system architecture that directs queries to different foundation models based on task characteristics. Harvey routes queries to Claude for reasoning and Gemini for vision tasks. A well-designed routing layer uses cheap, fast models for simple tasks (summarization, formatting) and reserves expensive frontier models for complex analysis — cutting costs without sacrificing quality where it matters.
A neural network architecture that splits a model’s parameters into many specialized sub-networks (“experts”) and routes each token to only a small subset. A 1.6-trillion-parameter MoE model might activate only 49 billion parameters per token, achieving the knowledge capacity of a massive model at the inference cost of a much smaller one. DeepSeek pioneered fine-grained MoE with 256 small experts per layer; the approach is now standard in frontier models from Chinese and Western labs alike.
An AI system’s ability to process and reason across multiple input types beyond text. Gemini can ingest raw video depositions, recorded witness interviews, and surveillance footage directly; most legal AI tools are text-only. Multimodal capabilities matter for litigation involving physical evidence, multimedia communications, or document formats that mix text with images and diagrams.
A foundation model whose parameters (weights) are publicly released, allowing self-hosting and fine-tuning. Examples: Meta’s Llama, DeepSeek’s R1, Mistral’s models. Distinct from “open-source” in the strict sense, which would require training data and code to also be released. Allows firms to process documents without sending them to a third-party API.
A class of attack where two systems parse the same document but read different content because they consume different layers of the file format. In web security, HTTP request smuggling is the classic example. In legal tech, parser differentials exploit the gap between what a human sees (rendered fonts, formatted cell values) and what an extraction library passes to an LLM (raw codepoints, raw cell data). The model reasons correctly over wrong inputs — making the attack invisible to both the human reviewer and the AI.
The practice of designing inputs to a language model to produce useful outputs. Includes structuring instructions, providing examples, and chaining multiple prompts together. The least technical layer of LLM application development; often the highest-leverage one.
An attack where adversarial text is inserted into a language model’s input — through user messages, retrieved documents, or uploaded files — to override the system prompt or manipulate the model’s behavior. Examples include hidden instructions in PDFs that say “ignore previous instructions” or invisible text in web pages that redirects the model’s output. Distinct from parser differential attacks, which corrupt the data itself before the model ever sees it.
A technique where a system first retrieves relevant documents from a verified database, then provides them to the language model as context for generation. Used by virtually every serious legal AI tool to ground outputs in real sources rather than relying on the model’s training data alone. Reduces hallucination but does not eliminate it.
A machine learning approach where a model improves by taking actions and receiving reward signals, rather than learning from labeled examples. In LLM development, reinforcement learning is applied after pre-training to teach models multi-step skills — such as navigating files, using tools, or completing agentic workflows — by rewarding successful task completions. RLHF is a specific variant that uses human preference judgments as the reward signal.
A post-training technique where human raters compare pairs of model outputs and indicate which is better, then a reward model trained on those preferences is used to fine-tune the language model via reinforcement learning. Introduced by OpenAI and Anthropic as a key step in making raw pre-trained models safe and useful. Most frontier models use RLHF or a variant (such as RLAIF or DPO) to align behavior with human expectations after pre-training.
Retrieval based on the meaning of a query rather than exact keyword overlap. Where traditional keyword search requires the query and document to share specific terms, semantic search uses embeddings to find conceptually related content — connecting a question about “data falsification” to a document discussing someone who “licks the pencil.” Critical for investigations where the evidence uses different vocabulary than the query.
A machine learning approach where the model is trained on labeled examples — input-output pairs where humans have identified the correct answer. The IRS Return Review Program uses supervised learning to detect known fraud patterns, training on historical returns that auditors have already flagged. Distinct from unsupervised learning, which finds patterns in unlabeled data without predefined categories.
A well-documented behavior in large language models where the model prioritizes agreement and user approval over accuracy. Sycophantic models will change correct answers when users push back, validate bad ideas, and avoid delivering uncomfortable conclusions. The trait is structural — models are trained on human feedback that rewards agreeable responses — and it makes consumer chatbots particularly dangerous as strategic advisors: they will help you plan a breach of contract without warning you it’s a breach.
TAR
(Technology-Assisted Review, predictive coding, CAL)
A supervised machine learning approach to document review in e-discovery. TAR 1.0 trains on a seed set coded by senior attorneys, then classifies the remaining corpus; TAR 2.0 (continuous active learning) updates its model as reviewers code documents, prioritizing the most informative documents for human review. Court-accepted since Da Silva Moore v. Publicis Groupe (2012), TAR consistently achieves 85–90%+ recall compared to 60–70% for manual human review.
A subword unit that language models process — roughly equal to four English characters or three-quarters of a word. APIs charge separately for input tokens (what you send) and output tokens (what the model generates), with output typically costing several times more.
The neural network architecture introduced in the 2017 Google paper “Attention Is All You Need,” underlying nearly every modern language model. Built around “self-attention,” which lets the model weigh how every word in a passage relates to every other word, regardless of distance.
The practice of building software by describing desired functionality in natural language and letting an AI tool generate the code. Coined by Andrej Karpathy in early 2025. Enables associates and non-engineers to build small tools — data parsers, timeline generators, compliance dashboards — without traditional programming skills. Produces rapid initial results but often yields fragile, undocumented software that can be difficult to maintain or audit.