Architecture
Aegis is a LangGraph-orchestrated agent that runs a deterministic 5-node pipeline. Each node is a discrete step with a defined input state and output state. Nodes that call an LLM are skippable (--no-llm) without affecting the others.
The 5-node pipeline
rules.yaml
│
▼
plan ──► parallel_table ──► reconcile ──► remediate ──► report
│
┌──────────────────┐
│ per table: │
│ execute │
│ classify │
│ diagnose │ ← tables run concurrently
│ rca │
└──────────────────┘
1. plan
Reads rules (YAML or Python objects), validates each against the Pydantic schema, and builds an execution plan — an ordered list grouped by table. Rules with shared scope are batched; the output is a {table → [rules]} mapping consumed by the next node.
2. parallel_table
The core fan-out node. Groups rules by target table and launches a full mini-pipeline for each table concurrently using asyncio.gather:
per table (concurrent):
execute → run all rules for this table against the warehouse adapter
classify → heuristic severity triage (escalates if >5% rows fail or blast radius is high)
diagnose → LLM writes plain-English explanation + likely cause + recommended action
rca → LLM traces root cause through the OpenLineage lineage graph
Results from all tables are merged back into a single state before the next node. With N tables, the wall-clock time is bounded by the slowest table, not the sum of all tables.
3. reconcile
Handles cross-table reconciliation rules (reconcile_row_count, reconcile_column_sum, reconcile_key_match). Runs source and target queries in parallel and computes the delta against a configurable tolerance. Non-reconciliation rules pass through unchanged.
4. remediate
For each diagnosed failure, calls the LLM with the rule type, diagnosis, and RCA context to generate a targeted SQL fix. Returns a RemediationProposal with proposed_sql, confidence (high / medium / low), and a caveat explaining what to verify before running. Skipped when remediation.proposal_strategy = "none" or when --no-llm is set.
5. report
Assembles the final report: run metadata, severity breakdown, per-rule results with LLM diagnosis, RCA, and remediation SQL, total LLM cost, and run duration. Writes to stdout via Rich, to --output-json if specified, and to the SQLite audit trail (~/.aegis/history.db).
Adapters
Aegis uses a two-tier adapter pattern — one tier for LLMs, one for warehouses. Adapters are thin protocol implementations; the pipeline nodes never call a warehouse or LLM directly.
LLM adapters
─────────────────────────────────────────────────
Anthropic claude-haiku-4-5 (default)
claude-sonnet-4-5
claude-opus-4-5
OpenAI gpt-4o-mini (default)
gpt-4o
Ollama any locally-pulled model
(llama3.2, mistral, phi3, etc.)
runs on http://localhost:11434
AWS Bedrock amazon.nova-pro-v1:0 (default, no approval needed)
any Converse API-compatible model
uses ~/.aws/credentials profile
Warehouse adapters
─────────────────────────────────────────────────
DuckDB local file or in-memory
BigQuery project + dataset via service account
Databricks cluster or SQL warehouse via token
Athena S3 + Glue catalog via IAM role
Implementing a new warehouse adapter requires a single Python class with three methods: connect(), execute_scalar(sql), and execute_sample(sql, limit).
Audit trail
Every LLM call made during a run is recorded in ~/.aegis/history.db (SQLite). The schema has two tables:
- runs — one row per
aegis runinvocation:run_id,started_at,rules_file,warehouse,llm,total_cost_usd,summary_json - decisions — one row per LLM call:
run_id,node(diagnose / rca / classify / remediate),rule_id,prompt,response,model,input_tokens,output_tokens,cost_usd,latency_ms
The decisions table has an FTS5 virtual table on (prompt, response), enabling full-text search:
ShareGPT export for fine-tuning
Each entry in the JSONL file is a ShareGPT-format conversation: the system prompt, the user turn (rule context + failed rows), and the assistant turn (the actual LLM diagnosis). This format is directly compatible with fine-tuning pipelines for most open-source models.
Integrations
Airflow
The AegisOperator wraps an aegis run invocation as a native Airflow task. See Airflow Integration.
dbt
aegis dbt generate manifest.json reads a dbt manifest.json and emits Aegis rules for every not_null, unique, accepted_values, and relationships test found in the manifest. See dbt Integration.
MCP server
Aegis ships a Model Context Protocol server that exposes five tools to Claude Desktop (or any MCP-compatible client):
| Tool | Description |
|---|---|
aegis_run |
Run a rules file against a warehouse and return the report |
aegis_validate |
Validate a rules file offline and return any errors |
aegis_list_runs |
List recent runs from the audit trail |
aegis_trajectory |
Return the full node trajectory for a given run ID |
aegis_search |
Full-text search the audit trail |
See MCP Server for the Claude Desktop configuration.