Prompt Quality Auditor

Report

Turns a working prompt into a measurable quality bar: 8 axes, a score, a v2.

Live output preview

Input Format: Input FormatOutputWatch the Output: Watch the Output

A plan is required to view this content

Choose a plan to access input format, sample outputs, and live previews.

View Plans →

About the skill

What it does

Takes a prompt that is already in production but only "seems to work" and moves it onto a measurable quality bar. It first parses the prompt into its RTCF/RTF anatomy (Role · Task · Context · Format) plus Constraints / Examples / Reasoning / Stop-guardrail parts, marking each present/absent + quality — every missing mandatory part becomes a finding. Then it applies an 8-axis weighted rubric: A1 Clarity & Directness (18%), A2 Task specificity (16%), A3 Context sufficiency (12%), A4 Format & structure (16%), A5 Few-shot exemplars (12%), A6 Reasoning/CoT (10%), A7 Robustness & safety (10%), A8 Token efficiency (6%). Each axis is scored 0-5, the weighted total scaled to 0-100, with a ceiling rule: if any of A1/A2/A4 is ≤2, the score cannot exceed 64 — so it never gives false confidence.

Every flaw is graded critical / warn / info; criticals must not ship. Fixes are then applied in severity order and a traceable v2 prompt is rewritten under a minimal-change principle, with each fix tied to a named principle ("collapsed Task to one sentence — CRISPE Specificity", "fenced input in <input> — injection resistance").

When to use it

When you have a prompt that works but is unreliable: inconsistent output, format drift, hallucination, jailbreak exposure, token bloat. To audit an agent/tool prompt before a release, to standardize a prompt library, or to answer "is this prompt good enough?" with a number instead of a gut feel. It is not zero-to-one prompt writing — it is audit + revision of an existing prompt.

Method / frameworks

Judgment is anchored to named canon, not intuition: Anthropic Prompting Best Practices (be clear & direct, multishot <example>, CoT <thinking>/<answer>, XML-tag structure, prefill — primary authority since the engine is Claude), CRISPE, RTCF/RTF, TCREI (the Evaluate/Iterate loop), LLM-as-judge + rubric/golden-set evaluation (G-Eval-style analytic rubric + pairwise A/B + golden reference), and foundational techniques — Chain-of-Thought (Wei et al. 2022), few-shot / in-context learning (Brown et al. 2020) and Lost-in-the-Middle (Liu et al. 2023, critical-instruction placement). The v1↔v2 delta is shown axis-by-axis, never from a single example.

How do I use this skill?

You don't "run" a skill — after installing it you just tell the agent your task (e.g. ask for the relevant job), and the skill kicks in by itself when its description matches.

Upload the prompt-kalite-denetcisi.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).

Open Settings → Customize → Skills
Upload → select the prompt-kalite-denetcisi.zip you downloaded
Claude reads SKILL.md; the name + description appear. Ready ✅

Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.