AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.
AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.
Screen Studio / raw footage
→ Claude / Codex
→ FFmpeg
→ Remotion
→ ElevenLabs / fal.ai
→ Descript or CapCut
Each layer has a specific job. Do not skip layers. Do not try to make one tool do everything.
Collect the source material:
videodb skill)Output: raw files ready for organization.
Use Claude Code or Codex to:
Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."
This layer is about structure, not final creative taste.
FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4
#!/bin/bash
# cuts.txt: start,end,label
while IFS=, read -r start end label; do
ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt
# Create file list
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4
Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:
import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";
export const VlogComposition: React.FC = () => {
const frame = useCurrentFrame();
return (
<AbsoluteFill>
{/* Main footage */}
<Sequence from={0} durationInFrames={300}>
<Video src="/segments/intro.mp4" />
</Sequence>
{/* Title overlay */}
<Sequence from={30} durationInFrames={90}>
<AbsoluteFill style={{
justifyContent: "center",
alignItems: "center",
}}>
<h1 style={{
fontSize: 72,
color: "white",
textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
}}>
The AI Editing Stack
</h1>
</AbsoluteFill>
</Sequence>
{/* Next segment */}
<Sequence from={300} durationInFrames={450}>
<Video src="/segments/demo.mp4" />
</Sequence>
</AbsoluteFill>
);
};
npx remotion render src/index.ts VlogComposition output.mp4
See the Remotion docs for detailed patterns and API reference.
Generate only what you need. Do not generate the whole video.
import os
import requests
resp = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your narration text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("voiceover.mp3", "wb") as f:
f.write(resp.content)
Use the fal-ai-media skill for:
Use for insert shots, thumbnails, or b-roll that doesn't exist:
generate(app_id: "fal-ai/nano-banana-pro", input_data: {
"prompt": "professional thumbnail for tech vlog, dark background, code on screen",
"image_size": "landscape_16_9"
})
If VideoDB is configured:
voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")
The last layer is human. Use a traditional editor for:
This is where taste lives. AI clears the repetitive work. You make the final calls.
Different platforms need different aspect ratios:
| Platform | Aspect Ratio | Resolution |
|---|---|---|
| YouTube | 16:9 | 1920x1080 |
| TikTok / Reels | 9:16 | 1080x1920 |
| Instagram Feed | 1:1 | 1080x1080 |
| X / Twitter | 16:9 or 1:1 | 1280x720 or 720x720 |
# 16:9 to 9:16 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
# 16:9 to 1:1 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
from videodb import ReframeMode
# Smart reframe (AI-guided subject tracking)
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
# Detect scene changes (threshold 0.3 = moderate sensitivity)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
# Find silent segments (useful for cutting dead air)
ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence
Use Claude to analyze transcript + scene timestamps:
"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."
| Tool | Strength | Weakness |
|---|---|---|
| Claude / Codex | Organization, planning, code generation | Not the creative taste layer |
| FFmpeg | Deterministic cuts, batch processing, format conversion | No visual editing UI |
| Remotion | Programmable overlays, composable scenes, reusable templates | Learning curve for non-devs |
| Screen Studio | Polished screen recordings immediately | Only screen capture |
| ElevenLabs | Voice, narration, music, SFX | Not the center of the workflow |
| Descript / CapCut | Final pacing, captions, polish | Manual, not automatable |
fal-ai-media — AI image, video, and audio generationvideodb — Server-side video processing, indexing, and streamingcontent-engine — Platform-native content distribution