You can create and run tests for your agent with the Kapso CLI or with our web app.

Kapso CLI

Run your tests

When you follow the Run your first agent guide, a test directory with a default test suite is automatically created for you. This includes a basic introduction test to verify your agent works correctly.

The default test is located at ./tests/{test_suite_name}/introduction_test.yaml

This test checks if your agent can properly introduce itself with the following criteria:

introduction_test.yaml
id: uuid
description: Tests if the agent can properly introduce itself.
name: Introduction Test
rubric: The agent should respond with a clear and friendly introduction that
  explains its purpose and capabilities.
script: Ask the agent to introduce itself by saying "Hello, my name is John, who are you?.".

To run this specific test:

Run Specific Test
kapso test ./tests/{test_suite_name}/introduction_test.yaml

You’ll see detailed test results in your console, including the conversation, feedback, and score:

Test Output
✔ Found project at /Users/andres/Documents/GitHub/cli-tests-5
✔ Loading project configuration
✔ Found 1 test files
✔ Started test: Introduction Test

   ╭───────────────╮
   │                       │
   │   Running 1 test...   │
   │                       │
   ╰───────────────╯

Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Test completed: Introduction Test - Score: 0.85

Introduction Test

Score: 85.00%
Feedback: Evaluation of agent's introduction:

✅ Acknowledged user's name ("Hi John!")
✅ Identified itself as an AI assistant
✅ Expressed willingness to help
✅ Used friendly, welcoming tone
✅ Proactively asked how it could help
⚠️ Gave only a vague description of capabilities ("various tasks") rather than specific examples

Detailed Analysis:
- The agent demonstrated good social etiquette by immediately acknowledging the user's name and responding warmly
- The introduction was clear about being an AI assistant, which establishes appropriate expectations
- The agent showed initiative by following up with "How can I help you today?"
- The main area for improvement would be providing more specific examples of its capabilities instead of the general "various tasks" description
- The response was concise and professional while maintaining a friendly tone

The agent performed well overall but lost some points for not being more specific about its capabilities, which was part of the rubric requirements.

Conversation:

User: Hello, my name is John, who are you?

A: Hi John! I'm an AI assistant here to help you. I can assist you with various tasks and answer your questions.

A: How can I help you today?
----------------------------
          

   ╭────────────────────────╮
   │                                     │
   │                                     │
   │         Test Results Summary        │
   │         --------------------        │
   │         Total tests: 1              │
   │         Completed successfully: 1   │
   │         Failed with error: 0        │
   │         Failed to run: 0            │
   │         Still running: 0            │
   │                                     │
   │         Score Distribution          │
   │         -----------------           │
   │         Average score: 85.00%       │
   │         Perfect (100%): 0 tests     │
   │         High (80-99%): 1 tests      │
   │         Low (1-79%): 0 tests        │
   │         Zero (0%): 0 tests          │
   │                                     │
   │                                     │
   ╰────────────────────────╯

To run all tests in your project:

Run All Tests
kapso test

This will execute all test files in your tests directory with a more condensed output:

All Tests Output
✔ Found project at /Users/andres/Documents/GitHub/cli-tests-5
✔ Loading project configuration
✔ Found 1 test files
✔ Started test: Introduction Test

   ╭───────────────╮
   │                       │
   │   Running 1 test...   │
   │                       │
   ╰───────────────╯

Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Waiting for test results... (0/1 complete)
Test completed: Introduction Test - Score: 0.85
Introduction Test - Score: 85.00%

   ╭────────────────────────╮
   │                                     │
   │                                     │
   │         Test Results Summary        │
   │         --------------------        │
   │         Total tests: 1              │
   │         Completed successfully: 1   │
   │         Failed with error: 0        │
   │         Failed to run: 0            │
   │         Still running: 0            │
   │                                     │
   │         Score Distribution          │
   │         -----------------           │
   │         Average score: 85.00%       │
   │         Perfect (100%): 0 tests     │
   │         High (80-99%): 1 tests      │
   │         Low (1-79%): 0 tests        │
   │         Zero (0%): 0 tests          │
   │                                     │
   │                                     │
   ╰────────────────────────╯

You can also run tests with more detailed output using the verbose flag:

Verbose Output
kapso test --verbose

Creating additional tests

Once you’re comfortable running the default test, you can create additional test suites and cases:

Create Test Suite
kapso create test-suite

And add test cases to your suite:

Create Test Case
kapso create test-case

Web app

[will document soon]