Glossary & Definitions

LLM Evaluation Terms & Definitions

Comprehensive definitions for all technical terms, frameworks, and concepts used throughout the platform.

All Terms

82 terms found

A/B Testing
Process
Experimental method comparing two versions to determine optimal performance through statistical analysis.

Related Terms:

+2 more
A/B Testing
Process
Experimental method comparing two versions of a system to determine which performs better.

Related Terms:

+2 more
Agent
Architecture
Autonomous system that can perform tasks by coordinating multiple tools and making decisions.

Related Terms:

+2 more
Answer Relevancy
Metrics
Measures whether the generated answer addresses the specific question asked, not just related topics.

Related Terms:

+2 more
Arize AX Enterprise
Technology
Full enterprise SaaS platform for LLM observability and evaluation with multi-cloud, hybrid cloud, and data center support.

Related Terms:

+3 more
Arize Phoenix
Technology
Enterprise-grade LLM observability platform with sophisticated performance monitoring and drift detection.

Related Terms:

+2 more
AutoGen
Technology
Microsoft's open-source framework for building LLM applications with multiple conversational AI agents that collaborate through natural language.

Related Terms:

+3 more
AutoMergingRetriever
Technology
A retrieval method that automatically merges related chunks to provide comprehensive context.

Related Terms:

+2 more
Axial Coding
Analysis Method
Second phase of qualitative analysis that connects and organizes initial codes into broader categories and relationships.

Related Terms:

+2 more
Baseline
Process
Initial performance measurement used as a reference point for future improvements.

Related Terms:

+2 more
Braintrust
Technology
Assessment and evaluation platform for AI applications with collaborative annotation and trace viewer capabilities.

Related Terms:

+2 more
ChatGPT (OpenAI)
Technology
Versatile LLM with strong analytical capabilities, function calling, and comprehensive evaluation skills.

Related Terms:

+2 more
Chunking
Technology
Process of breaking down documents into smaller, manageable pieces for processing and retrieval.

Related Terms:

+2 more
CI/CD
Process
Continuous Integration/Continuous Deployment - automated pipeline for building, testing, and deploying software.

Related Terms:

+2 more
Claude (Anthropic)
Technology
Advanced LLM with sophisticated reasoning capabilities and safety-focused training for evaluation tasks.

Related Terms:

+2 more
Compliance
Process
Adherence to regulatory requirements, industry standards, and organizational policies.

Related Terms:

+2 more
Context
Technology
Retrieved information used to inform LLM responses in RAG systems.

Related Terms:

+2 more
ContextRelevancyEvaluator
Technology
A LlamaIndex evaluator that measures how relevant retrieved context is to the user's query.

Related Terms:

+2 more
Contextual Precision
Metrics
Measures the proportion of retrieved context that is relevant to answering the query.

Related Terms:

+2 more
Contextual Recall
Metrics
Measures the proportion of relevant information that was successfully retrieved from the knowledge base.

Related Terms:

+2 more
Cost Tracking
Process
Monitoring and analysis of computational costs associated with LLM operations.

Related Terms:

+2 more
CrewAI
Technology
Multi-agent collaboration framework for orchestrating role-playing AI agents that collaborate to solve complex tasks.

Related Terms:

+2 more
CSAT
CSAT
Business
Customer Satisfaction Score - measures how satisfied customers are with products or services.

Related Terms:

+2 more
Customer Tools
Architecture
Specialized tools for customer information retrieval, property lookup, and seamless transfer workflows.

Related Terms:

+2 more
Data & Governance
Architecture
Framework for data ingestion, versioning, and governance in LLM systems.

Related Terms:

+2 more
Deflection
Business
Rate at which customer queries are resolved without human intervention.

Related Terms:

+2 more
Deterministic Code Checks
Evaluation Method
Rule-based validation using standard programming logic for reliable, consistent evaluation without LLM variability.

Related Terms:

+2 more
DSPy
Technology
Programmatic framework for building and optimizing LLM pipelines and agent systems using signature-based design.

Related Terms:

+2 more
Embedding
Technology
Dense vector representation of text that captures semantic meaning for similarity calculations.

Related Terms:

+2 more
Faithfulness / Correctness
Metrics
Measures the factual alignment of generated output to provided reference context, preventing hallucinations.

Related Terms:

+3 more
FaithfulnessEvaluator
Technology
A LlamaIndex evaluator that measures how faithful generated responses are to the provided context.

Related Terms:

+2 more
G-Eval (LLM-as-Judge)
G-Eval
Evaluation Method
Uses an evaluator LLM to score responses against a rubric covering actionability, completeness, tone, and next-step clarity.

Related Terms:

+3 more
Gemini (Google)
Technology
Multimodal LLM with deep analytical capabilities supporting text, images, audio, and video evaluation.

Related Terms:

+2 more
Golden Dataset
Process
Curated set of 50-200 test cases with known correct answers used for evaluation and benchmarking.

Related Terms:

+2 more
Google Sheets
Technology
Collaborative evaluation analysis platform with pivot tables, automation, and team coordination features.

Related Terms:

+2 more
Governance
Process
Framework for establishing ownership, responsibilities, and review processes for LLM evaluation.

Related Terms:

+2 more
Ground Truth
Technology
The correct or expected answer for a given query, used as a benchmark for evaluation.

Related Terms:

+2 more
Guardrails
Security
Safety mechanisms that prevent harmful, inappropriate, or non-compliant outputs from LLM systems.

Related Terms:

+2 more
GuidelineEvaluator
Technology
A LlamaIndex evaluator that checks responses against custom guidelines and policies.

Related Terms:

+2 more
Hallucination
Technology
When an LLM generates information that is not present in the training data or provided context.

Related Terms:

+2 more
Haystack Agents
Technology
Agent orchestration within Haystack's end-to-end NLP framework, enabling RAG-enhanced agents with document processing capabilities.

Related Terms:

+2 more
Helpfulness / Utility
Metrics
Measures whether the output fully resolves the underlying user need (actionability, tone, focus).

Related Terms:

+2 more
Hierarchical Retrieval
Technology
Retrieval method that uses multiple levels of document structure for comprehensive context.

Related Terms:

+2 more
I.O.R.M.G.O.D Framework
IORMGOD
Architecture
A production-ready architecture framework for reliable AI systems: Interface & Gateway, Orchestrator/Agent, Retrieval (RAG), Models, Guardrails, Observability & Eval, Data & Governance.

Related Terms:

+3 more
Interface & Gateway
Architecture
Entry point for user interactions, including authentication, rate limiting, and caching.

Related Terms:

+2 more
Julius AI
Technology
AI-powered notebook platform enabling natural language queries and automated insights generation.

Related Terms:

+2 more
Jupyter Notebooks
Technology
Interactive development environment for data analysis, experimentation, and reproducible research.

Related Terms:

+2 more
LangChain
Technology
Popular LLM framework for building applications with chains, agents, document loaders, and built-in evaluators.

Related Terms:

+2 more
Langfuse
Technology
A comprehensive observability platform for LLM applications providing tracing, experiments, prompt management, and scoring.

Related Terms:

+3 more
LangGraph
Technology
Low-level orchestration framework for building, managing, and deploying long-running, stateful AI agents with graph-based workflows.

Related Terms:

+3 more
LangSmith
Technology
LangChain observability and debugging platform with deep integration for trace inspection and debugging.

Related Terms:

+2 more
LlamaIndex
Technology
A comprehensive framework for building LLM applications with advanced parsing, retrieval, evaluation, and memory capabilities.

Related Terms:

+3 more
LLM
LLM
Technology
Large Language Model - AI model trained on vast amounts of text data to understand and generate human-like text.

Related Terms:

+2 more
LLM-as-Judge
Evaluation Method
Evaluation approach using LLMs to make semantic judgments constrained to binary TRUE/FALSE outputs.

Related Terms:

+2 more
M.A.G.I. Framework
MAGI
Framework
A comprehensive framework for production-grade LLM evaluation consisting of four pillars: Metrics, Automation, Governance, and Improvement.

Related Terms:

+3 more
Mem0
Technology
AI memory management framework providing persistent memory for agents across sessions and interactions.

Related Terms:

+2 more
Metadata Filtering
Technology
Process of filtering retrieved results based on document metadata (status, date, type, etc.).

Related Terms:

+2 more
Multi-Channel Communication
Architecture
Platform supporting SMS, text chat, and voice interfaces with intelligent routing and context management.

Related Terms:

+2 more
Node
Technology
Individual unit of processed content (chunk) in a document processing pipeline.

Related Terms:

+2 more
Observability
Technology
Comprehensive monitoring and logging of system behavior, performance, and quality metrics.

Related Terms:

+2 more
Open Coding
Analysis Method
Qualitative analysis technique for systematically identifying and categorizing themes in unstructured data.

Related Terms:

+2 more
Orchestrator
Architecture
System component that coordinates multiple services and manages workflow execution.

Related Terms:

+2 more
Parsing
Technology
Process of analyzing and structuring documents for further processing in LLM applications.

Related Terms:

+2 more
PII
PII
Security
Personally Identifiable Information - data that can identify specific individuals.

Related Terms:

+2 more
Postprocessing
Technology
Additional processing steps applied to retrieved results before final selection.

Related Terms:

+2 more
Prompt Management
Process
Systematic approach to creating, versioning, and optimizing prompts for LLM applications.

Related Terms:

+2 more
QAG (Question-Answer Generation)
QAG
Evaluation Method
An evaluation method that decomposes output into atomic claims, generates closed-ended questions, and verifies against context.

Related Terms:

+2 more
Quality Gates
Process
Automated checkpoints in CI/CD pipelines that enforce quality thresholds before deployment.

Related Terms:

+2 more
RAG
RAG
Technology
Retrieval-Augmented Generation - technique that combines retrieval of relevant information with text generation.

Related Terms:

+2 more
RAG over Knowledge
Architecture
Retrieval-Augmented Generation implementation over contextual knowledge bases with dynamic updates.

Related Terms:

+2 more
Reference
Technology
Ground truth or expected answer used for evaluation and comparison.

Related Terms:

+2 more
RelevancyEvaluator
Technology
A LlamaIndex evaluator that measures how relevant generated responses are to the user's query.

Related Terms:

+2 more
Reranking
Technology
Process of reordering retrieved results based on relevance scores or additional criteria.

Related Terms:

+2 more
Score Attribution
Process
Process of assigning evaluation scores to specific components or versions of a system.

Related Terms:

+2 more
Semantic Kernel
Technology
Microsoft's plugin-based orchestration framework for building AI applications with goal-oriented agents.

Related Terms:

+3 more
SemanticSplitter
Technology
A document parsing method that splits text based on semantic similarity rather than fixed chunk sizes.

Related Terms:

+2 more
SLO
SLO
Business
Service Level Objective - specific, measurable goals for system performance and reliability.

Related Terms:

+2 more
Statsig
Technology
Data-driven experimentation platform for feature flags, statistical testing, and cohort analysis.

Related Terms:

+2 more
Threshold
Process
Minimum acceptable score for a metric that triggers quality gates and deployment decisions.

Related Terms:

+2 more
Tracing
Technology
Detailed logging of request flow through LLM systems for debugging and optimization.

Related Terms:

+2 more
Vector Store
Technology
Database optimized for storing and querying high-dimensional vectors (embeddings).

Related Terms:

+2 more
Version Control
Process
System for tracking changes to datasets, models, prompts, and evaluation criteria over time.

Related Terms:

+2 more