Rule Schema Reference
All Aegis rules follow a consistent YAML schema anchored on apiVersion: aegis.dev/v1.
Full schema
apiVersion: aegis.dev/v1
kind: DataQualityRule
metadata:
id: <string> # required — unique identifier
severity: critical|high|medium|low|info # default: medium
domain: <string> # optional — e.g. retail, banking
owner: <string> # optional — team or person responsible
tags: [<string>, ...] # optional — for filtering and grouping
description: <string> # optional — human-readable description
scope:
warehouse: duckdb # default: duckdb
database: <string> # optional
schema: <string> # optional
table: <string> # required
columns: [<string>, ...] # required for column-scoped rules
logic:
type: <rule_type> # required — see types below
# type-specific fields follow
diagnosis:
common_causes: [<string>, ...] # hints for LLM diagnosis
lineage_hints:
upstream_tables: [<string>, ...]
remediation:
auto_remediate: false # default: false — never auto-fixes
proposal_strategy: llm_simple|llm_with_lineage|none
sla:
detection_window: 1h # how fast failures must be detected
notification_target: slack://channel # where to send alerts
Rule types
Completeness
not_null
Fails if any value in columns[0] is NULL.
not_empty_string
Fails if any value in columns[0] is NULL, empty string '', or whitespace-only.
null_percentage_below
Fails if the percentage of NULL values in columns[0] exceeds threshold percent.
scope:
table: products
columns: [description]
logic:
type: null_percentage_below
threshold: 5.0 # fail if > 5% nulls
Uniqueness
unique
Fails if any value in columns[0] appears more than once (excluding NULLs).
composite_unique
Fails if any combination of values across all listed columns is duplicated.
duplicate_percentage_below
Fails if the percentage of duplicate values in columns[0] exceeds threshold.
scope:
table: orders
columns: [order_id]
logic:
type: duplicate_percentage_below
threshold: 1.0 # fail if > 1% duplicates
Validity — Numeric
sql_expression
Fails for any row where the SQL expression evaluates to false. The expression is a WHERE clause for rows that pass.
between
Fails if columns[0] is outside the range [min_value, max_value].
min_value_check
Fails if columns[0] is less than min_value.
max_value_check
Fails if columns[0] exceeds max_value.
Validity — String
regex_match
Fails if columns[0] does not match pattern (uses DuckDB regexp_matches).
scope:
table: customers
columns: [email]
logic:
type: regex_match
pattern: '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$'
accepted_values
Fails if columns[0] contains a value not in the values list.
scope:
table: orders
columns: [status]
logic:
type: accepted_values
values: [pending, shipped, completed, refunded]
not_accepted_values
Fails if columns[0] contains any value in the prohibited values list.
scope:
table: orders
columns: [status]
logic:
type: not_accepted_values
values: [test, dummy, placeholder]
column_exists
Fails if columns[0] does not exist in the table schema.
Validity — Temporal
no_future_dates
Fails if columns[0] contains any date greater than today.
date_order
Fails if columns[0] is greater than column_b (i.e. start > end).
freshness
Fails if the maximum value of columns[0] is older than threshold hours ago. If no column is specified, checks the table's last modification time.
scope:
table: orders
columns: [updated_at]
logic:
type: freshness
threshold: 24 # fail if no data newer than 24 hours
unit: hours
Referential Integrity
foreign_key
Fails if any non-null value in columns[0] does not exist in reference_table.reference_column.
scope:
table: orders
columns: [customer_id]
logic:
type: foreign_key
reference_table: customers
reference_column: customer_id
conditional_not_null
Fails if columns[0] is NULL when condition is true.
scope:
table: orders
columns: [refund_reason]
logic:
type: conditional_not_null
condition: "status = 'refunded'"
Statistical
mean_between
Fails if the mean of columns[0] is outside [min_value, max_value].
stddev_below
Fails if the standard deviation of columns[0] exceeds threshold.
column_sum_between
Fails if the sum of columns[0] is outside [min_value, max_value].
scope:
table: daily_settlements
columns: [amount]
logic:
type: column_sum_between
min_value: 0
max_value: 1000000
Volume
row_count
Fails if the table has fewer than threshold rows.
row_count_between
Fails if the row count is outside [min_value, max_value].
custom_sql
Runs an arbitrary SQL query. The query must return a single boolean value — true means the check passed.
scope:
table: orders
logic:
type: custom_sql
query: |
SELECT COUNT(*) = 0
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL
Severity levels
| Level | Meaning | Typical use |
|---|---|---|
critical |
Data is unusable — pipeline should stop | NULL primary keys, broken foreign keys |
high |
Significant quality issue — must be fixed soon | Negative revenue, invalid status codes |
medium |
Noticeable quality issue — investigate | Freshness exceeded, high null rate |
low |
Minor quality concern | Missing optional fields |
info |
Informational — no action required | Statistical baselines |
Built-in templates
Rather than writing logic from scratch, reference a named template from the catalog:
aegis rules list # see all 31 templates
aegis rules list --category validity # filter by category
aegis rules list --json # machine-readable output
Each template maps directly to one of the 31 rule types above with sensible defaults already set.