Video Caption Generator
ReportTranscribes short clips from a Drive folder, deduplicates, and generates captions plus YT/FB titles.
Live output preview
A plan is required to view this content
Choose a plan to access input format, sample outputs, and live previews.
View Plans →About the skill
What it does
Prepares short-form MP4 clips dropped into a Google Drive folder for publishing, in batch. The pipeline has three stages: (1) Transcription — each clip is transcribed locally with Whisper (model turbo); since Whisper runs at roughly 2-3% WER on clean audio but 8-12% on real-world field audio, every transcript is treated as a draft, not a record of truth. (2) Deduplication — files sharing the same transcript are treated as content duplicates and removed; but pairs like 0411.mp4 / 0411(1).mp4 (same audio, different on-screen title) are recognized as A/B title variants and none are skipped. (3) Generation — for each unique clip it writes a first-person, conversational, hashtag-free 2-4 sentence caption (hook first, insight second) and a sub-60-character title built on the curiosity-gap principle.
Processed Drive IDs are logged to processed_ids.json, so re-running skips already-seen clips — making it an idempotent automation.
When to use it
When a team or creator producing regular short-form content accumulates raw clips in a folder; when each clip needs a quick transcript, caption, and title for a publishing calendar; when A/B title variants must be separated rather than deduped away; when you want to remove the manual transcribe-and-copywrite bottleneck.
Method / frameworks
- Whisper ASR (turbo) — local transcription; transcript always a draft.
- Content-based dedup — transcript-signature comparison, not filename; A/B variant preservation.
- Curiosity-gap titling — tension/number first, under 60 chars (~50-55 visible on mobile), no over-promising for "Satisfied CTR".
- Hook-first captions — first person, conversational, no hashtags, 2-4 sentences.
- Idempotent batch log —
processed_ids.jsonprevents reprocessing.
How do I use this skill?
Upload the video-caption-generator.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).
- Open Settings → Customize → Skills
- Upload → select the
video-caption-generator.zipyou downloaded - Claude reads
SKILL.md; the name + description appear. Ready ✅
Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.