mirror of https://github.com/promptfoo/promptfoo.git synced 2026-06-22 16:15:37 -06:00

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. https://promptfoo.dev

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops pentesting prompt-engineering prompt-testing prompts rag red-teaming testing vulnerability-scanners

TypeScript 96.8%
CSS 1.7%
JavaScript 1.1%
Python 0.2%
HTML 0.1%

Find a file

Yufeng He 644e71b06b fix(testCaseReader): preserve JSONL row description instead of overwriting (#9840 ) Co-authored-by: mldangelo <michael.l.dangelo@gmail.com>		2026-06-22 14:04:17 -04:00
.agents	docs(agents): improve Codex skill routing (#9130 )	2026-05-06 14:16:48 -04:00
.claude	chore: use local block-no-verify install instead of npx in Claude Code hook (#9675 )	2026-06-09 17:10:51 -07:00
.claude-plugin	feat(redteam): publish all four promptfoo skills to the Claude Code marketplace (#9665 )	2026-06-09 08:37:43 -07:00
.cursor	chore(deps): remove unused ts-node dependency (#6731 )	2025-12-17 00:06:12 -08:00
.devcontainer	chore(deps): update mcr.microsoft.com/vscode/devcontainers/typescript-node docker tag to v24 (#7059 )	2026-01-14 10:26:05 -08:00
.github	fix: restore Docker CLI links and validate rate-limit reset metadata (#9790 )	2026-06-17 13:32:19 -04:00
.husky	chore: tighten knip dead-file checks (#9074 )	2026-05-03 13:55:33 -04:00
.vscode	chore: update CODEOWNERS handles and VS Code association (#8299 )	2026-03-24 09:28:00 -07:00
architecture	feat(providers): add Moonshot (Kimi) provider (#9672 )	2026-06-18 00:13:49 -04:00
code-scan-action	fix(deps): update type definitions (#9832 )	2026-06-21 20:51:02 -04:00
docs	refactor(eval): add evaluation store port (#9601 )	2026-06-03 14:15:56 -04:00
drizzle	docs(agents): add subsystem AGENTS.md context files (#9579 )	2026-06-02 00:48:44 -04:00
examples	fix(deps): update opentelemetry (#9827 )	2026-06-21 20:03:45 -04:00
helm/chart/promptfoo	fix(helm): correct Docker registry domain from fghcr.io to ghcr.io (#7056 )	2026-01-14 09:07:06 -08:00
plugins	fix(redteam): publish marketplace skill fixes (#9676 )	2026-06-09 16:55:12 -07:00
scripts	fix(deps): constrain undici to <7.27.1 to fix Node 26 "terminated" error (#9668 )	2026-06-09 15:13:25 -07:00
site	fix(csv): preserve quoted commas in contains-any/all assertion values (#9761 )	2026-06-21 21:48:19 -07:00
src	fix(testCaseReader): preserve JSONL row description instead of overwriting (#9840 )	2026-06-22 14:04:17 -04:00
test	fix(testCaseReader): preserve JSONL row description instead of overwriting (#9840 )	2026-06-22 14:04:17 -04:00
tools/biome	test: isolate env mutations in root tests (#8789 )	2026-04-18 00:41:57 -07:00
.biomeignore	chore(providers): remove adaline gateway provider (#6999 )	2026-01-10 03:51:02 -05:00
.dockerignore	feat: Migrate NextUI to a React App (#1637 )	2024-09-16 21:38:27 -06:00
.git-blame-ignore-revs	chore(biome): run linter (#6761 )	2025-12-18 09:29:32 -08:00
.gitignore	feat(providers): support Agents SDK 0.9 workflows (#9128 )	2026-05-08 14:38:34 -04:00
.mailmap	chore: add mailmap aliases for public handles (#8458 )	2026-04-02 14:46:58 -07:00
.npmignore	docs: Merge docs into main repo (#317 )	2023-11-30 11:23:35 -08:00
.npmrc	fix(deps): avoid incompatible npm release-age config (#9244 )	2026-05-16 09:12:15 -07:00
.nvmrc	chore(deps): update node.js (#9802 )	2026-06-18 10:37:54 -04:00
.prettierignore	test: add vitest coverage configuration for all test suites (#7154 )	2026-01-26 12:07:07 -08:00
.prettierrc.yaml	chore: migrate from ESLint + Prettier to Biome (#4903 )	2025-07-13 00:11:30 -04:00
.release-please-manifest.json	chore(main): release code-scan-action 0.1.8 (#9602 )	2026-06-16 13:45:57 -04:00
.rubocop.yml	feat: Add ruby provider (#5902 )	2025-10-13 09:21:41 -07:00
.ruff.toml	feat: Claude Agent SDK provider support (#5509 )	2025-10-13 10:19:57 -07:00
AGENTS.md	docs(agents): watch main CI after landing a PR and fix flakes at the source (#9678 )	2026-06-09 22:02:09 -07:00
biome.jsonc	chore(deps): update biome to v2.4.16 (#9648 )	2026-06-08 10:31:34 -07:00
CHANGELOG.md	chore(main): release 0.121.17 (#9770 )	2026-06-16 12:58:50 -04:00
CITATION.cff	docs: add faizan as a contributor in citation file (#6879 )	2025-12-29 19:29:27 -05:00
CLAUDE.md	chore: consolidate agent instruction files using AGENTS.md standard (#6398 )	2025-11-28 19:23:19 -05:00
CODE_OF_CONDUCT.md	docs: add Contributor Covenant 3.0 Code of Conduct (#7022 )	2026-01-12 15:43:01 -08:00
codecov.yml	ci(codecov): make project coverage status informational (#9755 )	2026-06-16 09:53:47 -04:00
CONTRIBUTING.md	chore: add minimumReleaseAge policy for npm dependencies (#6383 )	2025-11-27 14:09:25 -05:00
Dockerfile	fix: restore Docker CLI links and validate rate-limit reset metadata (#9790 )	2026-06-17 13:32:19 -04:00
drizzle.config.ts	chore: migrate drizzle (#1922 )	2024-10-17 14:22:42 -07:00
knip.json	chore: tighten knip dead-file checks (#9074 )	2026-05-03 13:55:33 -04:00
LICENSE	chore: update year	2025-01-16 15:07:58 -08:00
package-lock.json	chore(deps): update ibm packages to ^1.7.14 (#9839 )	2026-06-22 08:32:12 -04:00
package.json	chore(deps): update ibm packages to ^1.7.14 (#9839 )	2026-06-22 08:32:12 -04:00
pnpm-workspace.yaml	chore(build): add pnpm support (#3307 )	2025-03-06 11:19:37 -08:00
README.md	chore(build): add Node 26 support (#9222 )	2026-05-13 17:47:37 -07:00
release-please-config.json	chore(release-please): bump last-release-sha to clear drift guard (#9844 )	2026-06-22 12:04:53 -04:00
renovate.json	chore(deps): hold tsdown on 0.21.x while Node 20 is supported (#9731 )	2026-06-15 11:53:52 -04:00
SECURITY.md	docs: clarify runtime feedback-loop scope (#9314 )	2026-05-20 22:55:21 -04:00
tsconfig.json	feat(build): publish a lightweight promptfoo/contracts subpath (#9535 )	2026-05-31 21:09:33 -04:00
tsdown.config.ts	feat(build): publish a lightweight promptfoo/contracts subpath (#9535 )	2026-05-31 21:09:33 -04:00
vitest.config.ts	test: stop intermittent forks-worker crash from failing green CI shards (#9681 )	2026-06-09 22:01:59 -07:00
vitest.integration.config.ts	test: enforce root TypeScript coverage (#9101 )	2026-05-04 14:24:15 -04:00
vitest.setup.ts	fix(db): isolate libsql test databases (#9504 )	2026-05-28 16:29:21 -04:00
vitest.smoke.config.ts	test: add CLI and library smoke tests (#6669 )	2025-12-29 20:45:53 -05:00

README.md

Promptfoo: LLM evals & red teaming

promptfoo is a CLI and library for evaluating and red-teaming LLM apps. Stop the trial-and-error approach - start shipping secure, reliable AI apps.

Website · Getting Started · Red Teaming · Documentation · Discord

Promptfoo is now part of OpenAI. Promptfoo remains open source and MIT licensed. Read the company update.

Quick Start

Requires Node.js ^20.20.0 or >=22.22.0 for npm and npx usage.

npm install -g promptfoo
promptfoo init --example getting-started

Also available via brew install promptfoo and pip install promptfoo. You can also use npx promptfoo@latest to run any command without installing.

Most LLM providers require an API key. Set yours as an environment variable:

export OPENAI_API_KEY=sk-abc123

Once you're in the example directory, run an eval and view results:

cd getting-started
promptfoo eval
promptfoo view

See Getting Started (evals) or Red Teaming (vulnerability scanning) for more.

What can you do with Promptfoo?

Test your prompts and models with automated evaluations
Secure your LLM apps with red teaming and vulnerability scanning
Compare models side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and more)
Automate checks in CI/CD
Review pull requests for LLM-related security and compliance issues with code scanning
Share results with your team

Here's what it looks like in action:

It works on the command line too:

It also can generate security vulnerability reports:

Why Promptfoo?

Developer-first: Fast, with features like live reload and caching
Private: LLM evals run 100% locally - your prompts never leave your machine
Flexible: Works with any LLM API or programming language
Battle-tested: Powers LLM apps serving 10M+ users in production
Data-driven: Make decisions based on metrics, not gut feel
Open source: MIT licensed, with an active community

Learn More

Contributing

We welcome contributions! Check out our contributing guide to get started.

Join our Discord community for help and discussion.