OpenClaw Prompt 预算优化：长会话降低 Token 消耗

2026-04-17 · 8 分钟进阶玩法

为什么要精简 Prompt 预算

问题背景

长会话中 OpenClaw 会发送：

完整的 system prompt
所有启动时加载的 skills 说明
记忆系统检索结果
工具使用指南
完整对话历史

每次请求可能包含 20,000 - 100,000 Token。对于：

频繁的短对话
Cron 定时任务
批量处理

这会导致 Token 成本激增。

v2026.4.15 的改进

v2026.4.15 精简了默认预算：

启动 prompt：缩减冗余说明
skills prompt：只加载激活 skill 的说明
memory_get：默认返回精简结果
QMD 读取：长会话减少上下文注入

查看当前预算

CLI 诊断

# 查看 prompt 预算使用情况
openclaw doctor prompt

# 输出示例：
# System prompt: 3200 tokens
# Skills prompts: 8500 tokens
# Memory context: 5200 tokens
# Tool definitions: 4100 tokens
# Total overhead: 21000 tokens

会话级分析

# 分析最近会话的 Token 使用
openclaw session analyze <session-id>

# 或
openclaw session analyze --last 10

精简启动 Prompt

默认优化

v2026.4.15 自动应用以下优化：

系统说明更简洁
去除冗余示例
精简工具格式说明

通常可以节省 30-50% 启动 Prompt。

自定义精简

# config.yaml
prompts:
  system:
    style: "concise"  # concise | normal | verbose
    includeExamples: false
    maxLength: 2000  # Token 数

极简模式

prompts:
  system:
    style: "minimal"

极简模式：

只保留核心指令
完全去除示例
最短描述

适合：简单任务的 Cron 任务。

Skills Prompt 优化

按需加载

plugins:
  skillsLoading:
    strategy: "on-demand"  # 默认：only-needed | eager
    maxActive: 10          # 最多激活 10 个 skill

按需加载可以大幅减少 Skill 相关 Prompt。

精简 Skill 说明

plugins:
  my-skill:
    promptStyle: "concise"  # 使用精简版说明

memory_get 优化

默认摘要

v2026.4.15 默认 memory_get 返回摘要而非完整内容：

memory:
  query:
    defaultLength: "excerpt"  # excerpt | summary | full
    maxExcerptLength: 500  # Token
    continuationMetadata: true  # 提示如何获取完整内容

Agent 看到：

记忆摘要（500 字）
[完整内容：3500 字，调用 memory_get_full 获取]

需要完整内容时 Agent 会显式请求。

分页检索

memory:
  query:
    pagination:
      enabled: true
      pageSize: 5

每次只返回 5 条，Agent 需要时分页获取。

QMD（Quick Memory Dump）优化

长会话自动摘要

memory:
  qmd:
    longSessionSummary: true
    thresholdTurns: 20  # 20 轮对话后开始摘要

20 轮之后，早期对话自动摘要。

激进压缩

memory:
  qmd:
    compressionRatio: 0.3  # 保留 30%
    keyInfoPriority:
      - "user preference"
      - "decisions"
      - "bugs fixed"

工具定义精简

工具说明压缩

tools:
  promptOptimization:
    enabled: true
    style: "concise"

工具数量控制

agents:
  my-agent:
    tools:
      include: ["read", "write", "grep"]  # 明确列出
      # 不要用 include: "*"

只包含实际使用的工具。

缓存配合

Prompt Caching

providers:
  anthropic:
    cache:
      enabled: true
      system: true       # 缓存 system prompt
      tools: true        # 缓存工具定义
      cacheControl: "5m" # 5 分钟 TTL

缓存后：

第二次请求相同 system 省 90% Token
大幅降低长对话成本

KV Cache（本地模型）

LM Studio 等本地模型也可以启用：

providers:
  lm-studio:
    kvCache:
      enabled: true
      maxEntries: 100

实际节省效果

案例：客服 Agent

某团队的客服 Agent：

优化前：

每次对话约 30K Token
月成本：$1200

应用 v2026.4.15 优化后：

每次对话约 15K Token（精简启动 + 按需 Skills）
加上 Prompt Caching：有效 Token 约 8K
月成本：$320

节省 73%。

案例：Cron 任务

每天 100 次 Cron 任务：

优化前：

每次 20K Token
月成本：$180

极简模式后：

每次 5K Token
月成本：$45

节省 75%。

监控效果

Token 消耗仪表板

# 每日 Token 使用报告
openclaw report tokens --daily

# 按 Agent 分类
openclaw report tokens --by-agent

# 按 Provider 分类
openclaw report tokens --by-provider

成本告警

monitoring:
  costAlerts:
    dailyBudget: 50  # 美元
    monthlyBudget: 1000
    alertChannel: "feishu:ops"

最佳实践

1. 按场景选择预算

profiles:
  minimal:    # Cron 任务
    prompts:
      system:
        style: "minimal"
    memory:
      query:
        defaultLength: "excerpt"
  
  balanced:   # 日常对话
    prompts:
      system:
        style: "concise"
  
  rich:       # 复杂任务
    prompts:
      system:
        style: "normal"

2. 监控和调整

定期检查：

openclaw report tokens --last 7d

分析哪些请求 Token 高，针对性优化。

3. 分层架构

agents:
  # 入口简单路由
  dispatcher:
    model: "haiku"
    promptStyle: "minimal"
    
  # 深度处理
  worker:
    model: "sonnet"
    promptStyle: "normal"

入口用小模型+精简 prompt，深度处理用大模型+完整 prompt。

注意事项

Prompt 精简需要 OpenClaw v2026.4.15 或更高版本
极简模式可能影响 Agent 在复杂任务上的表现
先在测试环境验证精简后的行为
Prompt Caching 需要 Anthropic 或其他支持的 Provider
部分 skills 依赖完整 prompt，精简后可能失效
成本监控应该持续进行，不是一次性优化
精简过度可能让 Agent 变笨，保持平衡