Anthropic

LLM agent 教學：tools 與環境感知怎麼設計

Agent = LLM + tools + loop + stop——還有為什麼「先看再動手」是寫穩 agent 的關鍵。

TL;DR

Agent = LLM + tools + loop + 結束條件——四件事缺一不可
給抽象、可組合的 tools 比給「一鍵完成」的 hyper-specialized tool 強
讓 agent 先看環境再動手（read before write、screenshot after click）成功率立刻拉高
MCP server 是 agent tool 的最佳供應源

一個情境：「設個下週三早上九點的健身提醒」

使用者丟這句話，agent 要：

知道「現在」是哪一天（不知道下週三是幾號）
算出下週三的日期
設提醒

如果你給 agent 一個 set_reminder_for_next_wednesday_gym 這種大而全的 tool，換成「90 天保固什麼時候到期？」就完蛋。

但如果你給三個小工具：

get_current_datetime()
add_duration_to_datetime(base, days)
set_reminder(time, message)

Claude 會自己組合。健身提醒就是 1 → 2 → 3；保固到期就是 1 → 2（先問使用者買的日期）。抽象 tool 的組合空間遠大於 specialized tool 的功能總和。

Agent 的四個元素

┌──────────────────────────────────────────┐
│  Agent Loop                              │
│                                          │
│  ┌─────┐    ┌──────┐    ┌──────────┐    │
│  │ LLM │ ──►│ Tool │ ──►│  State   │    │
│  └─────┘    └──────┘    └──────────┘    │
│     ▲                        │           │
│     └────────────────────────┘           │
│                                          │
│  Stop: stop_reason == "end_turn"         │
└──────────────────────────────────────────┘

元素	是什麼	沒它會怎樣
Tools	Claude 能呼叫的函式（讀檔、查 DB、送 email…）	只能講話，不能做事
State	tool result + conversation history 累積	下一輪忘記上一輪做了什麼
Loop	拿到 tool_use → 執行 → 結果丟回 → 再叫 Claude	一回合就停，做不完事
Stop	`stop_reason: "end_turn"` 或自設上限	無窮迴圈、燒錢

這個 loop 你在 Section 4 multi-turn tool use 已經寫過了。Agent 沒有任何魔法，就是把那個 loop 給更抽象的 tools 來自由組合。

Tools 要抽象，不要 specialized

Claude Code 的 tool 設計是經典範例：

有 ✅                沒有 ❌
─────────────       ──────────────────────
bash                refactor_code
read                install_dependencies
write               run_tests_and_fix
edit                deploy_to_production
glob                update_changelog
grep                generate_pr_description

「沒有」那欄看起來都是高頻動作，為什麼不做專門 tool？因為每個專門 tool 都是一個 hard-code 的決策——它假設「refactor 就是這樣做」。但實際 refactor 的方法千百種，封裝起來反而綁死 agent 的彈性。

給 bash + read + write + edit + grep 五件抽象工具，agent 可以自己組合出 refactor、install、deploy、寫 changelog，還能處理你想都沒想到的任務。

設計 tool 的判準：

這個 tool 能否被其他組合替代？能 → 太 specialized
換個任務還用得到嗎？用不到 → 太窄
tool 的名字描述「動作」而不是「目的」？描述目的的話太具體

環境感知：先看再動手

Agent 是「眼盲」的：它丟出 tool call 之後，環境怎麼變它不知道。沒有觀察就沒有適應。

Read before write

要 Claude 改檔之前，先讓它讀檔：

system = """
你在編輯使用者的程式碼。修改任何檔案前，先呼叫 read 把現有內容讀進來。
不要假設檔案內容，永遠先觀察。
"""

聽起來很廢話，但少了這一句，Claude 常會「猜」現有 code 結構然後寫出衝突的修改。Computer Use 的設計也是同個邏輯：每次 click 之後自動回傳一張 screenshot，讓 Claude 看到操作結果再決定下一步。

給 agent 「驗收」的工具

如果 agent 在生成影片，給它一個查驗工具：

tools = [
    {"name": "ffmpeg", ...},                     # 做事用
    {"name": "extract_screenshot", ...},          # 驗收用
    {"name": "generate_caption_with_whisper", ...}, # 驗收用
]

system prompt：

生成影片後：
1. 用 extract_screenshot 抽 5 張時間點的截圖確認畫面
2. 用 whisper 產字幕檔，比對對白時間點是否正確
3. 有問題就回到生成步驟調整參數

這就是上一篇提到的 evaluator-optimizer pattern 用在 agent 裡：agent 自己做、自己評、不行再重做。

MCP server 是 agent tool 的最佳供應源

寫一個 agent 通常要把 tool 接到外部世界：DB、Sentry、Jira、Figma、Slack…如果每接一個都要自己包 schema + 寫執行邏輯，工作量爆炸。

MCP 解決這件事：它定義了「server 怎麼宣告 tool / 怎麼回 result」的標準協定。生態系裡已經有：

sentry-mcp — 抓 production error
playwright-mcp — 跑瀏覽器測試
mcp-atlassian — 讀 Jira / Confluence
slack-mcp — 發訊息通知人類
figma-context-mcp — 把設計稿丟給 Claude

接 MCP server，agent 立刻多一批工具可用，而且換 client 不用重寫 server。詳細協定跟自己寫 server 看 MCP 系列。

接下來

最後一篇收尾——看 Anthropic 自己怎麼把 agent 概念做成產品：Claude Code、Computer Use、Claude Agent SDK 三個產品的定位差在哪、什麼時候自己寫 vs 直接用 Anthropic 現成的，順便給課程結束後的下一步建議。