ADK Eval Guide

Cloud / Infra

Manages the evaluation process for Google ADK agents with an eval-fix loop and produces a Markdown report with a score table, root-cause analysis, and fixes.

Live output preview

A plan is required to view this content

Choose a plan to access input format, sample outputs, and live previews.

View Plans →

About the skill

ADK Eval Guide (google-agents-cli-eval)

A guide skill that manages the evaluation (eval) process for agents written with the Google ADK (Agent Development Kit) from start to finish. It guides you through running evalsets with agents-cli eval run, setting up the eval_config.json and evalset.json schemas, correctly selecting the 8 evaluation criteria (tool_trajectory_avg_score, final_response_match_v2, rubric_based, hallucinations_v1, safety_v1, etc.), and configuring LLM-as-judge.

When to use it: when you want to "run eval," "evaluate my ADK agent," "write an evalset," "debug eval scores," or compare two result files. Its real strength is the eval-fix loop: when a score drops below threshold, it diagnoses the cause, fixes the code/instruction/evalset, and reruns — it doesn't just report the failure. It provides, in a table, the common pitfalls (proactivity trajectory gap, app-name mismatch, state type mismatch, thinking mode skipping tools) and what to fix for which failure.

Output: a Markdown report with a score table, root-cause analysis, applied fixes, and the deploy-gate decision — along with runnable agents-cli command blocks. It requires no API key or web access; it relies on the local agents-cli tool (uv tool install google-agents-cli).

How do I use this skill?

You don't "run" a skill — after installing it you just tell the agent your task (e.g. ask for the relevant job), and the skill kicks in by itself when its description matches.

Upload the google-agents-cli-eval.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).

Open Settings → Customize → Skills
Upload → select the google-agents-cli-eval.zip you downloaded
Claude reads SKILL.md; the name + description appear. Ready ✅

Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.