Test agents
Test Suites and Cases
Test suites provide an automated way to evaluate and ensure the consistent behavior of AI Agents.
Key Concepts
- Test Suite: A collection of test cases designed to evaluate an agent’s performance across various scenarios.
- Test Case: Defines a specific interaction scenario involving three core components:
- Agent: The AI agent being tested.
- User Simulator (LLM): An LLM that simulates user interaction based on a predefined script.
- Judge (LLM): An LLM (typically a more advanced model) that evaluates the conversation between the agent and the user simulator against a specified rubric.
- Script: Instructions defining what the User Simulator LLM should say or do at each step of the conversation, in response to the agent’s messages.
- Rubric: A set of criteria defined by the developer used by the Judge LLM to evaluate the agent’s performance during the test case.
- Scoring: The Judge LLM assigns a score between 0.0 and 1.0 based on the rubric, indicating the agent’s adherence to the desired behavior.
Automated testing helps maintain agent reliability and predictability during development and updates.
Testing with the CLI
Use the kapso
CLI tool to manage and execute tests:
kapso test --verbose
: Run all test suites found in the project. The--verbose
flag provides detailed output.
Run All Tests
kapso test <path_to_test_file>
: Run a specific test case file or all test cases within a directory.
Run Specific Test
kapso create test-case
: Interactively create a new test case file.
Create Test Case
kapso create test-suite
: Interactively create a new test suite configuration.
Create Test Suite
Testing in Web UI
[image]