Promptopus
Documentation menu

Graders

Every grader type across the three families, with parameters and examples.

A grader scores one output and returns { score (0–1), passed, detail }. Graders are tagged objects discriminated by type. They come in three families; the family is inferred from the type.

Tip: put shared graders in defaults.graders and override per case only where needed. See Configuration.

Deterministic family

Fast, free, no API calls. Ideal as structural gates.

non-empty

Passes if the output (trimmed) isn’t empty.

- { type: non-empty }

max-length

Passes if output.text.lengthchars.

- { type: max-length, chars: 900 }
ParamTypeRequired
charspositive integeryes

equals

Exact match against value.

- { type: equals, value: "Paris", caseInsensitive: true, trim: true }
ParamTypeDefault
valuestring
caseInsensitivebooleanfalse
trimbooleanfalse

contains

Passes if the output contains value as a substring.

- { type: contains, value: "Tokyo", caseInsensitive: true }

regex

Passes if the pattern matches. The pattern is compiled at load time — an invalid regex is a friendly config error.

- { type: regex, pattern: "^\\d{4}-\\d{2}-\\d{2}$" }
- { type: regex, pattern: "error", flags: "i" }
ParamTypeNotes
patternstringJS regex source. Escape backslashes in YAML.
flagsstring?e.g. i, m, s.

Asserting absence. regex is a positive match, so to require that markdown is absent use a negative-lookahead pattern, e.g. ^(?![\s\S]*(?:^\s*[-*+]\s|^#))[\s\S]+$ with flag m.

is-valid-json

Passes if the (trimmed) output parses as JSON.

- { type: is-valid-json }

json-schema

Parses the output as JSON and validates it against an inline schema. Supports a common JSON-Schema subset: type (incl. integer), required, properties, items, enum, minLength/maxLength, minimum/maximum.

- type: json-schema
  schema:
    type: object
    required: [answer, confidence]
    properties:
      answer: { type: string, minLength: 1 }
      confidence: { type: number, minimum: 0, maximum: 1 }

The detail lists the first few validation errors when it fails.

LLM-as-judge family

These call the configured judge model. They require a top-level judge: block (the loader enforces this). Judge failures — a network error or malformed JSON — degrade gracefully to a failing result with a clear detail, never crashing the run. See LLM-as-judge for depth.

judge-faithfulness

Asks the judge whether the output stays faithful to the case’s source — no facts absent from or contradicting the source. Requires the case to set source.

- { type: judge-faithfulness, threshold: 0.7 }
ParamTypeDefault
thresholdnumber 0–10.7

passed is score ≥ threshold. If a case has no source, the grader fails with an explanatory detail.

judge-quality

A general quality rubric. Pass a custom rubric to steer scoring; the case’s reference (if present) is shown to the judge as a strong answer.

- type: judge-quality
  threshold: 0.7
  rubric: >-
    Score 0.0–1.0. Deduct for wordiness, meta-references ("this article"),
    missing main points, or markdown. Reserve 1.0 for a flawless answer.
ParamTypeDefault
rubricstring?a built-in general rubric
thresholdnumber 0–10.7

Cost + latency family

These grade the metrics every call already produces — no extra API calls.

latency-budget

Passes if the call’s latency is within budget. Provide maxMs and/or p95Ms (at least one). When over budget, the score degrades proportionally (budget / latency) rather than snapping to 0.

- { type: latency-budget, p95Ms: 6000 }
- { type: latency-budget, maxMs: 4000 }

The per-call grader checks each call against the budget; the report separately aggregates true p50/p95 latency across the suite (see Dashboard).

cost-budget

Passes if the call’s computed USD cost is within maxUsd.

- { type: cost-budget, maxUsd: 0.002 }
ParamTypeRequired
maxUsdpositive numberyes

How scores roll up

For each provider, the report computes mean score per family and an overall pass rate (the fraction of grader checks that passed across all of that provider’s cells). The dashboard shows these side by side. A grader’s detail string is preserved per cell for drill-down.

Next: LLM-as-judge.