Prompt Quality Auditor
ReportTurns a working prompt into a measurable quality bar: 8 axes, a score, a v2.
Live output preview
A plan is required to view this content
Choose a plan to access input format, sample outputs, and live previews.
View Plans →About the skill
What it does
Takes a prompt that is already in production but only "seems to work" and moves it onto a measurable quality bar. It first parses the prompt into its RTCF/RTF anatomy (Role · Task · Context · Format) plus Constraints / Examples / Reasoning / Stop-guardrail parts, marking each present/absent + quality — every missing mandatory part becomes a finding. Then it applies an 8-axis weighted rubric: A1 Clarity & Directness (18%), A2 Task specificity (16%), A3 Context sufficiency (12%), A4 Format & structure (16%), A5 Few-shot exemplars (12%), A6 Reasoning/CoT (10%), A7 Robustness & safety (10%), A8 Token efficiency (6%). Each axis is scored 0-5, the weighted total scaled to 0-100, with a ceiling rule: if any of A1/A2/A4 is ≤2, the score cannot exceed 64 — so it never gives false confidence.
Every flaw is graded critical / warn / info; criticals must not ship. Fixes are then applied in severity order and a traceable v2 prompt is rewritten under a minimal-change principle, with each fix tied to a named principle ("collapsed Task to one sentence — CRISPE Specificity", "fenced input in <input> — injection resistance").
When to use it
When you have a prompt that works but is unreliable: inconsistent output, format drift, hallucination, jailbreak exposure, token bloat. To audit an agent/tool prompt before a release, to standardize a prompt library, or to answer "is this prompt good enough?" with a number instead of a gut feel. It is not zero-to-one prompt writing — it is audit + revision of an existing prompt.
Method / frameworks
Judgment is anchored to named canon, not intuition: Anthropic Prompting Best Practices (be clear & direct, multishot <example>, CoT <thinking>/<answer>, XML-tag structure, prefill — primary authority since the engine is Claude), CRISPE, RTCF/RTF, TCREI (the Evaluate/Iterate loop), LLM-as-judge + rubric/golden-set evaluation (G-Eval-style analytic rubric + pairwise A/B + golden reference), and foundational techniques — Chain-of-Thought (Wei et al. 2022), few-shot / in-context learning (Brown et al. 2020) and Lost-in-the-Middle (Liu et al. 2023, critical-instruction placement). The v1↔v2 delta is shown axis-by-axis, never from a single example.
How do I use this skill?
Upload the prompt-kalite-denetcisi.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).
- Open Settings → Customize → Skills
- Upload → select the
prompt-kalite-denetcisi.zipyou downloaded - Claude reads
SKILL.md; the name + description appear. Ready ✅
Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.