Documentation menu
Getting started
Writing evals
Tooling
Going further
Extending
Add providers and graders in your own code — no fork required — via a config file or the library API.
The Provider and Grader interfaces are the extension points. You can add either in your own
project — without forking Promptopus — in two ways:
- A
promptopus.config.mjsplugin file the CLI auto-loads. - The programmatic library API (
import … from 'promptopus').
Plugins via promptopus.config.mjs
Drop a promptopus.config.mjs (or .js) next to your suite. The CLI discovers it automatically (or
pass --config <path>). It maps custom provider kinds and grader types to factory functions:
// promptopus.config.mjs
import { defineConfig, computeCostUsd } from 'promptopus';
export default defineConfig({
providers: {
// referenced in YAML as: kind: cohere
cohere: (spec) => ({
name: spec.name,
model: spec.model,
async generate(prompt, opts) {
const start = performance.now();
const res = await fetch('https://api.cohere.com/v2/chat', {
method: 'POST',
headers: {
authorization: `Bearer ${process.env[spec.apiKeyEnv ?? 'COHERE_API_KEY']}`,
'content-type': 'application/json',
},
body: JSON.stringify({ model: spec.model, messages: [{ role: 'user', content: prompt }] }),
});
const data = await res.json();
const text = data.message.content[0].text;
const tokensIn = data.usage.tokens.input;
const tokensOut = data.usage.tokens.output;
return {
text,
tokensIn,
tokensOut,
latencyMs: Math.round(performance.now() - start),
costUsd: computeCostUsd(spec.model, tokensIn, tokensOut),
};
},
}),
},
graders: {
// referenced in YAML as: type: word-count
'word-count': (spec) => ({
id: `word-count(${spec.max})`,
family: 'deterministic',
grade: ({ output }) => {
const n = output.text.trim().split(/\s+/).filter(Boolean).length;
const passed = n <= spec.max;
return {
graderId: `word-count(${spec.max})`,
family: 'deterministic',
score: passed ? 1 : 0,
passed,
detail: `${n}/${spec.max} words`,
};
},
}),
},
});
Now use them in any suite — Promptopus validates built-in types strictly but lets custom kind/type
through to your factories:
providers:
- { name: cohere-cmd, kind: cohere, model: command-r, apiKeyEnv: COHERE_API_KEY }
cases:
- id: c1
prompt: "Summarize: {{source}}"
source: "…"
graders:
- { type: non-empty }
- { type: word-count, max: 60 } # your custom grader
promptopus run my.suite.yaml # config auto-loaded
# → • plugins: promptopus.config.mjs
The factory contracts
- Provider factory:
(spec) => Provider.specis the YAML object (kind,name,model, and any extra fields you add). Return an object implementinggenerate(prompt, opts) → { text, tokensIn, tokensOut, latencyMs, costUsd }. - Grader factory:
(spec, deps) => Grader. Return{ id, family, grade(ctx) }wheregradereturns{ graderId, family, score (0–1), passed, detail }.deps.judgeis the judge client when a judge is configured — use it to build LLM-as-judge graders of your own.
Throw a ProviderError with retryable: true for transient failures so the runner’s backoff applies.
Programmatic (library) API
For full control, drive the harness from code. Everything is exported from promptopus:
import {
loadSuite,
runSuite,
createProviderRegistry,
createGraderRegistry,
} from 'promptopus';
const suite = loadSuite('my.suite.yaml');
const providers = createProviderRegistry().register('cohere', cohereFactory);
const graders = createGraderRegistry().register('word-count', wordCountFactory);
const report = await runSuite(suite, {
providers: suite.providers.map((s) => providers.create(s)),
createGrader: (spec, deps) => graders.create(spec, deps),
});
// report is the same JSON the CLI writes and the dashboard reads
Useful exports: runSuite, buildReport, loadSuite / parseSuiteConfig / resolveSuite,
createProviderRegistry / ProviderRegistry, createGraderRegistry / GraderRegistry,
OpenAIProvider / AnthropicProvider / MockProvider, computeCostUsd, ProviderError, and all the
domain types (Provider, Grader, Report, …).
Contributing a built-in
If your provider or grader is broadly useful, add it to the package itself: implement the interface in
packages/core/src/providers/ or src/graders/, register it in the built-in registry
(createProviderRegistry / createGraderRegistry), add a strict zod variant for nice validation, and a
pricing row for providers. See CONTRIBUTING.
Next: The ReadAloud dogfood.