Skip to main content
Embedded AI agent that can use tools, access data, and have multi-turn conversations with users.

Execution behavior

Unlike other nodes that execute and immediately advance, agent nodes maintain workflow execution at the node until the agent explicitly calls the complete_task tool. This allows:
  • Multi-turn conversations: Agent can exchange multiple messages with the user
  • Stateful execution: Maintains conversation context throughout
  • Tool orchestration: Decides when to call tools, send messages, or complete
  • Dynamic input: New user messages are automatically injected into the agent’s conversation
  • Controlled completion: Workflow only advances when agent determines task is done

Configuration

  • id: Unique node identifier
  • system_prompt: Instructions for the agent’s behavior
  • provider_model_name: AI model to use
  • temperature: Model creativity, 0.0-1.0 (default: 0.0)
  • max_iterations: Maximum tool calls/responses (default: 80)
  • max_tokens: Maximum tokens per response (default: 8192)
  • reasoning_effort: For o1 models - low, medium, high (optional)
  • webhooks: Custom API tools (optional)
  • function_tools: Deployed functions as tools (optional)
  • app_integration_tools: Pre-configured app integrations as tools (optional)
  • mcp_servers: MCP server tools (optional, HTTP streamable only)
  • observer_prompt_mode: Behavior when outbound disabled (advanced, see below)

Custom tools

Extend agent capabilities with external integrations.

Webhook tools

Call external APIs during agent execution. Configure URL, method, headers, and body with variable interpolation.

Function tools

Call deployed functions (Cloudflare Workers) as agent tools. Each function tool has:
  • Name: Tool identifier the agent calls (letters, numbers, underscores, dashes)
  • Description: Tells the agent when to use this tool
  • Function: Select a deployed function
  • Input Schema: Define the arguments the agent can pass
Payload structure:
{
  "input": { ... },              // tool arguments from the agent
  "execution_context": { ... },  // flow vars, system, context, metadata
  "flow_info": { ... },          // flow id, name, step_id
  "flow_events": [ ... ],        // most recent 10 events
  "whatsapp_context": { ... }    // present for WhatsApp runs
}
The agent only controls input. Kapso automatically injects the rest. Response format: Return JSON. Include a vars object to update flow variables:
{
  "vars": {
    "lead_saved": true,
    "lead_id": "abc123"
  }
}

App integration tools

Call pre-configured app integrations as agent tools. Connect to HubSpot, Slack, Google Sheets, Notion, Airtable, and 2,700+ other apps via Pipedream. Each app integration tool has:
  • Name: Tool identifier the agent calls (letters, numbers, underscores, dashes)
  • Description: Tells the agent when to use this tool
  • App integration: Select a pre-configured integration from your project
How it works:
  1. Configure an app integration in your project (select app, action, connect account)
  2. Mark fields as “Pre-configured” (fixed values) or “Passed by agent” (runtime)
  3. Attach the integration to an agent step as a tool
  4. Agent calls the tool with values for “Passed by agent” fields
Input format:
{
  "input": {
    "contact_email": "user@example.com",
    "contact_name": "John Doe"
  }
}
Only fields marked as “Passed by agent” in the integration config are accepted. Pre-configured values merge automatically. Response: Returns the raw response from the app integration. Check the specific app’s documentation for response format.

MCP servers

MCP server URLs and headers support variable substitution:
# URL with variables
https://api.example.com/mcp/{{system.customer.external_customer_id}}

# Headers with env and context
Authorization: Bearer ${ENV:MCP_API_KEY}
X-Customer-Id: {{system.customer.id}}
X-Phone: {{context.phone_number}}
Supported: {{vars.*}}, {{system.*}}, {{context.*}}, ${ENV:KEY}
URLs resolving to localhost or private IPs will fail in production (SSRF protection).

Observer mode

When a workflow runs with outbound messages disabled (allow_outbound: false), the agent operates in “observer mode”. The observer_prompt_mode setting controls how the agent behaves:
  • interactive_chat (default): Agent chats with the operator via the Workflow Chat sidebar in the inbox. Use for workflows where human review or input is needed.
  • analysis_only: Agent runs non-interactively with no chat interface. Use for background analysis or logging workflows.
The Workflow Chat sidebar appears in the inbox when viewing conversations with active observer-mode executions.

Built-in tools

Send a message to the user without waiting for a response.Parameters:
  • message (string, required): The text message to send
Usage: Send progress updates, confirmations, or notifications
Send media files to the user via WhatsApp.Parameters:
  • media_url (string, required): URL of the media file
  • media_type (string, required): “image”, “video”, “audio”, or “document”
  • caption (string, optional): Caption for the media
Usage: Share images, documents, or other media content
Access flow execution context and variables.Parameters: NoneReturns: Flow variables, execution context, and metadataUsage: Access stored data and flow state information
Get WhatsApp conversation details.Parameters: NoneReturns: Phone number, conversation ID, and contact informationUsage: Access user contact details for personalization
Store data for use in later flow steps.Parameters:
  • key (string, required): Variable name
  • value (any, required): Value to store
Usage: Save user data, API responses, or calculated values
Retrieve previously stored data.Parameters:
  • key (string, required): Variable name to retrieve
Usage: Access data saved in earlier steps
Get the current date and time.Parameters: NoneReturns: Current timestamp in ISO formatUsage: Time-based logic and timestamp generation
Complete the agent’s task and continue the flow.Parameters: NoneUsage: Signal task completion and advance to next step
Transfer the conversation to a human agent.Parameters:
  • reason (string, optional): Reason for handoff
Usage: Escalate complex issues to human support
Analyze files and answer questions about their content. Supports PDFs, images, text files, and Office documents (.docx, .xlsx, .pptx).Parameters:
  • file_url (string, required): Kapso file URL (use WhatsApp media_data.url from get_whatsapp_context)
  • question (string, required): What you want to know about the file
Returns: Answer text, filename, MIME typeLimits:
  • Max file size: 30MB
  • Office docs: Text extracted (DOCX paragraphs, XLSX first 10 sheets/50 rows, PPTX first 40 slides)
  • Legacy formats (.doc/.xls/.ppt) not supported - convert to modern format first
Usage: Summarize documents, extract data from spreadsheets, analyze images
Custom tools for external API integration.Parameters: Defined by webhook configurationUsage: Call external APIs, fetch data, trigger actions

External inputs

When a workflow is resumed via API or receives input from non-WhatsApp sources (e.g., Slack replies, API payloads), the agent automatically tags these as external inputs to distinguish them from direct user messages. How it works: External inputs are wrapped in <external_input> tags when presented to the agent:
<external_input>
{"status": "approved", "comments": "Looks good"}
</external_input>
The agent’s system prompt includes context that these inputs are from internal teams or external systems, not the WhatsApp user. This helps the agent:
  • Understand the input source
  • Adapt its tone (e.g., acknowledge internal team input differently than user messages)
  • Make better decisions about what to communicate to the end user
Triggering external inputs: External inputs are automatically created when:
  • Resuming a workflow via the Platform API resume endpoint with a payload
  • Using Slack integration to provide internal team responses
  • Triggering workflows via API with initial data
Example workflow: When the Slack reply arrives, it’s tagged as an external input so the agent knows it’s from your team, not the customer.

How it works

  1. Starts conversation: Uses system prompt and conversation history
  2. Tool access: Can call built-in tools and custom webhooks
  3. Multi-turn: Continues until calls complete_task or needs user input
  4. Message injection: New user messages are automatically injected during conversation
  5. External input tagging: API payloads and non-WhatsApp inputs are wrapped in <external_input> tags
  6. Workflow control: Returns next edge when task completed, wait when needs input

Usage patterns

Support workflow Data processing