Long-Form Video Clip Pipeline

Video / Short

Turns long episodes into publish-ready highlight clips, automatically

Live output preview

Input Format: Input FormatOutputWatch the Output: Watch the Output

A plan is required to view this content

Choose a plan to access input format, sample outputs, and live previews.

View Plans →

About the skill

What it does

A five-stage automation that turns a long-form YouTube episode (podcast, interview, talk) into standalone, watchable highlight clips: Download → Transcribe → AI Segment → Verify → Cut.

Download — yt-dlp pulls the video plus auto-subtitles (VTT) at best quality.
Transcribe — openai-whisper produces a local, word-level timestamped transcript. Whisper emits segment-level timestamps natively; word boundaries are interpolated, so the medium model (~98% accuracy) is recommended for production, large for noisy audio.
AI Segment — Claude finds the 3-5 strongest standalone segments, scoring each for hook strength 1-10 (minimum 6), ensuring a complete narrative arc and clean cut boundaries.
Verify — Claude re-checks cut boundaries for sentence integrity, dropping clips that start or end mid-sentence.
Cut — FFmpeg cuts 16:9 landscape clips; longform_pipeline.py re-encodes for frame accuracy, while clip_cutter.py offers fast stream-copy (-c copy).

scored_pipeline.py adds a 10-expert LLM quality panel: it only cuts clips scoring 90+, with a dry-run mode that scores without cutting.

When to use it

Converting long YouTube content (podcasts/interviews/talks) into highlight clips
Processing a YouTube back catalog into a clips channel
Finding the best standalone segments from transcripts
Cutting clips with verified sentence boundaries
Running a high-volume clip operation ($0.50-1.00 per episode)

Method / frameworks

Hook-strength scoring (1-10, min 6) — aligned with short-form retention research: intro retention (past first 3s) ideally >70%, completion >60%. Weak-hook segments are dropped.
Narrative-arc integrity — clips must contain a complete arc watchable out of context.
Word-level timestamp verification — for clean boundaries (the WhisperX/forced-alignment principle: <100 ms alignment).
10-expert LLM panel scoring — a multi-judge quality gate (scored_pipeline).
Stream-copy vs re-encode trade-off — speed (keyframe ±1-2s) or frame-accurate cuts.

How do I use this skill?

You don't "run" a skill — after installing it you just tell the agent your task (e.g. ask for the relevant job), and the skill kicks in by itself when its description matches.

Upload the video-clip-pipeline.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).

Open Settings → Customize → Skills
Upload → select the video-clip-pipeline.zip you downloaded
Claude reads SKILL.md; the name + description appear. Ready ✅

Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.