SkillMachine
All skills

Video Caption Generator

Report

Transcribes short clips from a Drive folder, deduplicates, and generates captions plus YT/FB titles.

Live output preview

Input Format: Input FormatOutputWatch the Output: Watch the Output

A plan is required to view this content

Choose a plan to access input format, sample outputs, and live previews.

View Plans →

About the skill

What it does

Prepares short-form MP4 clips dropped into a Google Drive folder for publishing, in batch. The pipeline has three stages: (1) Transcription — each clip is transcribed locally with Whisper (model turbo); since Whisper runs at roughly 2-3% WER on clean audio but 8-12% on real-world field audio, every transcript is treated as a draft, not a record of truth. (2) Deduplication — files sharing the same transcript are treated as content duplicates and removed; but pairs like 0411.mp4 / 0411(1).mp4 (same audio, different on-screen title) are recognized as A/B title variants and none are skipped. (3) Generation — for each unique clip it writes a first-person, conversational, hashtag-free 2-4 sentence caption (hook first, insight second) and a sub-60-character title built on the curiosity-gap principle.

Processed Drive IDs are logged to processed_ids.json, so re-running skips already-seen clips — making it an idempotent automation.

When to use it

When a team or creator producing regular short-form content accumulates raw clips in a folder; when each clip needs a quick transcript, caption, and title for a publishing calendar; when A/B title variants must be separated rather than deduped away; when you want to remove the manual transcribe-and-copywrite bottleneck.

Method / frameworks

  • Whisper ASR (turbo) — local transcription; transcript always a draft.
  • Content-based dedup — transcript-signature comparison, not filename; A/B variant preservation.
  • Curiosity-gap titling — tension/number first, under 60 chars (~50-55 visible on mobile), no over-promising for "Satisfied CTR".
  • Hook-first captions — first person, conversational, no hashtags, 2-4 sentences.
  • Idempotent batch logprocessed_ids.json prevents reprocessing.

How do I use this skill?

You don't "run" a skill — after installing it you just tell the agent your task (e.g. ask for the relevant job), and the skill kicks in by itself when its description matches.

Upload the video-caption-generator.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).

  1. Open Settings → Customize → Skills
  2. Upload → select the video-caption-generator.zip you downloaded
  3. Claude reads SKILL.md; the name + description appear. Ready ✅

Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.