ADK Eval Guide
Cloud / InfraManages the evaluation process for Google ADK agents with an eval-fix loop and produces a Markdown report with a score table, root-cause analysis, and fixes.
Live output preview
A plan is required to view this content
Choose a plan to access input format, sample outputs, and live previews.
View Plans →About the skill
ADK Eval Guide (google-agents-cli-eval)
A guide skill that manages the evaluation (eval) process for agents written with the Google ADK (Agent Development Kit) from start to finish. It guides you through running evalsets with agents-cli eval run, setting up the eval_config.json and evalset.json schemas, correctly selecting the 8 evaluation criteria (tool_trajectory_avg_score, final_response_match_v2, rubric_based, hallucinations_v1, safety_v1, etc.), and configuring LLM-as-judge.
When to use it: when you want to "run eval," "evaluate my ADK agent," "write an evalset," "debug eval scores," or compare two result files. Its real strength is the eval-fix loop: when a score drops below threshold, it diagnoses the cause, fixes the code/instruction/evalset, and reruns — it doesn't just report the failure. It provides, in a table, the common pitfalls (proactivity trajectory gap, app-name mismatch, state type mismatch, thinking mode skipping tools) and what to fix for which failure.
Output: a Markdown report with a score table, root-cause analysis, applied fixes, and the deploy-gate decision — along with runnable agents-cli command blocks. It requires no API key or web access; it relies on the local agents-cli tool (uv tool install google-agents-cli).
How do I use this skill?
Upload the google-agents-cli-eval.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).
- Open Settings → Customize → Skills
- Upload → select the
google-agents-cli-eval.zipyou downloaded - Claude reads
SKILL.md; the name + description appear. Ready ✅
Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.