The definitive skill for "Director-Level" AI video orchestration. Seedance 2.0 is not a descriptive model; it is an instructional model. It responds best to technical cinematography, physics directives, and precise camera grammar.
seedance-v2.0-t2v.seedance-v2.0-i2v.seedance-v2.0-extend.@tag system (@Image1, @Video1, @Audio1) for style, motion, rhythm, and sound locking.| Input Type | Limit | Formats | Max Size |
|---|---|---|---|
| Images | ≤ 9 | jpeg, png, webp, bmp, tiff, gif | 30 MB each |
| Videos | ≤ 3 | mp4, mov | 50 MB each, total duration 2–15s |
| Audio | ≤ 3 | mp3, wav | 15 MB each, total duration ≤ 15s |
| Total files | ≤ 12 | — | — |
Output: 4–15 seconds, auto-generated sound effects / background music, 480p–720p.
--mode extend requires a request_id from a previous seedance-v2.0-t2v or seedance-v2.0-i2v job.16:9, 9:16, 4:3, 3:4.basic (faster) or high (higher fidelity).Assign explicit roles to each uploaded asset. This is the most critical part of multimodal prompting.
@Image1 @Image2 @Image3 ... (up to 9)
@Video1 @Video2 @Video3 (up to 3)
@Audio1 @Audio2 @Audio3 (up to 3)
| Purpose | Example Syntax |
|---|---|
| First frame | @Image1 as the first frame |
| Last frame | @Image2 as the last frame |
| Character appearance | @Image1's character as the subject |
| Scene / background | scene references @Image3 |
| Camera movement | reference @Video1's camera movement |
| Action / motion | reference @Video1's action choreography |
| Visual effects | completely reference @Video1's effects and transitions |
| Rhythm / tempo | video rhythm references @Video1 |
| Voice / tone | narration voice references @Video1 |
| Background music | BGM references @Audio1 |
| Sound effects | sound effects reference @Video3's audio |
| Outfit / clothing | wearing the outfit from @Image2 |
| Product appearance | product details reference @Image3 |
@Image1's character as the subject, reference @Video1's camera movement
and action choreography, BGM references @Audio1, scene references @Image2
Structure prompts using this hierarchy for professional results:
| Component | Instruction Type | Example |
|---|---|---|
| Scene | Environment + Lighting | "A rain-soaked cyberpunk street, magenta neon reflections on wet asphalt." |
| Subject | Identity + Detail | "A woman in a black trenchcoat, determined focus, cinematic skin textures." |
| Action | Fluid Interaction | "Walking forward through the crowd, coat billowing slightly in the wind." |
| Camera | Movement + Lens | "Medium tracking shot, 35mm lens, slow dolly backward. Subtle handheld jitter." |
| Style | Mood + Intent | "Cinematic epic, warm color grade, shallow DOF, rack focus to subject's face." |
Break prompts into timed segments for precise control:
0–3s: [opening scene, camera, action]
3–6s: [mid-section development]
6–10s: [climax or key action]
10–15s: [resolution, ending shot, final text/branding]
| Term | Description |
|---|---|
| Push in / Slow push | Camera moves toward subject |
| Pull back / Pull away | Camera moves away from subject |
| Pan left/right | Camera rotates horizontally |
| Tilt up/down | Camera rotates vertically |
| Track / Follow shot | Camera follows subject movement |
| Orbit / Revolve | Camera circles around subject |
| One-take / Oner | Continuous shot with no cuts |
| Term | Description |
|---|---|
| Hitchcock zoom (dolly zoom) | Push in + zoom out — creates vertigo effect |
| Fisheye lens | Ultra-wide distorted lens |
| Low angle / High angle | Camera below/above subject |
| Bird's eye / Overhead | Top-down view |
| First-person POV | Subjective camera from character's eyes |
| Whip pan | Very fast horizontal pan with motion blur |
| Crane shot | Vertical movement like a crane arm |
| Term | Description |
|---|---|
| Extreme close-up | Eyes, mouth, or small detail only |
| Close-up | Face fills frame |
| Medium close-up | Head and shoulders |
| Medium shot | Waist up |
| Full shot | Entire body |
| Wide / Establishing shot | Full environment |
The Agent MUST transform user intent into a technical "Director Brief" before execution.
[00:00-00:05s] format to specify timing.# Epic reveal shot
bash scripts/generate-seedance.sh \
--subject "a hidden temple in the Andes, mist rolling through the canopy" \
--intent "epic" \
--aspect "16:9" \
--duration 10 \
--quality high \
--view
# Tense close-up, vertical for social
bash scripts/generate-seedance.sh \
--subject "a detective examines a cryptic clue under harsh lamp light" \
--intent "tense" \
--aspect "9:16" \
--duration 5
Animate one or more reference images. Up to 9 images can be supplied — the model synthesizes motion across all of them.
# Animate a single local image
bash scripts/generate-seedance.sh \
--mode i2v \
--file hero.jpg \
--subject "hero strides forward, coat billowing in slow motion" \
--intent "epic" \
--aspect "16:9" \
--view
# Animate from a URL
bash scripts/generate-seedance.sh \
--mode i2v \
--image "https://example.com/scene.jpg" \
--subject "camera slowly pulls back to reveal the full landscape" \
--intent "reveal" \
--duration 10
# Multi-image blending (character + environment + style reference)
bash scripts/generate-seedance.sh \
--mode i2v \
--file character.jpg \
--file environment.jpg \
--image "https://example.com/style.jpg" \
--subject "character walks through the environment in cinematic style" \
--quality high
# With video and audio references (@-system)
bash scripts/generate-seedance.sh \
--mode i2v \
--file character.jpg \
--video-file reference_motion.mp4 \
--audio-file bgm.mp3 \
--subject "@Image1's character as the subject, reference @Video1's camera movement, BGM references @Audio1" \
--quality high
Continue an existing Seedance 2.0 video seamlessly, preserving visual style, motion, and audio.
# Extend with no new prompt (model continues naturally)
bash scripts/generate-seedance.sh \
--mode extend \
--request-id "abc-123-def-456" \
--duration 10
# Extend with directional prompt
bash scripts/generate-seedance.sh \
--mode extend \
--request-id "abc-123-def-456" \
--subject "camera continues to pull back, revealing the vast city below" \
--intent "reveal" \
--duration 10 \
--quality high \
--view
# Submit and get request_id immediately
RESULT=$(bash scripts/generate-seedance.sh --mode i2v --file photo.jpg --async --json)
REQUEST_ID=$(echo "$RESULT" | jq -r '.request_id')
# Check later
bash ../../../../core/media/generate-video.sh --result "$REQUEST_ID"
The man in @Image1 walks tiredly down the hallway, slowing his steps,
finally stopping at his front door. Close-up on his face — he takes a
deep breath, replaces the weariness with a relaxed expression.
Maintain high character consistency, zero facial flicker, persistent clothing details.
Reference @Image1's male character. He is in @Image2's elevator.
Completely reference @Video1's camera movements and facial expressions.
Hitchcock zoom during the fear moment, then orbit shots of the interior.
Elevator doors open, follow shot walking out.
Extend @Video1 by 10 seconds.
1–5s: Light and shadow slowly slide across table through venetian blinds.
6–10s: A coffee bean drifts down. Camera pushes in toward it until screen goes black.
English text gradually appears — "Lucky Coffee", "Breakfast", "AM 7:00-10:00".
Extend backward 10s. In warm afternoon light, the camera starts from
the corner with awning fluttering in the breeze, slowly tilting down
to flowers peeking out at the wall base, building anticipation for the main scene.
Subvert @Video1's plot — the character's expression shifts from warmth to
cold determination. The action is decisive, without hesitation.
Maintain all other visual elements (scene, lighting, timing).
bash scripts/generate-seedance.sh \
--mode i2v \
--file img1.jpg --file img2.jpg --file img3.jpg \
--video-file reference_edit.mp4 \
--audio-file track.mp3 \
--subject "@Image1 @Image2 @Image3 — match the keyframe positions and rhythm of @Video1 for beat-synced cuts. BGM references @Audio1. More dynamic movement, dreamlike visual style." \
--duration 15 --quality high
In the "Cat & Dog Roast Show" — emotionally expressive comedy segment:
Cat host (licking paw, rolling eyes): "Who understands my suffering?"
Dog host (head tilted, tail wagging): "You're one to talk? You sleep 18 hours a day..."
Sound: lively studio ambience, audience laughter, punchy transitions.
@Image1 @Image2 @Image3 — one-take tracking shot following a runner
from the street up stairs, through a corridor, onto a rooftop,
finally overlooking the city. No cuts throughout.
bash scripts/generate-seedance.sh \
--mode i2v \
--file product.jpg \
--subject "Deconstruct the product. Static camera. Hamburger suspended mid-air, rotating slowly. Ingredients separate and reassemble. Cheese continues to melt and drip. Ultimate food aesthetics." \
--intent "product" \
--aspect "9:16" \
--duration 15 --quality high
bash scripts/generate-seedance.sh \
--subject "15-second health educational clip. 0–5s: Transparent blue human upper body, camera pushes into a clear artery, blood flows smoothly. 5–10s: Sugar and fat particles enter bloodstream, lipid deposits form on vessel walls. 10–15s: Vessel narrows, before/after comparison. 4K medical CGI, semi-transparent visualization." \
--intent "educational" \
--duration 15 --quality high
[SCENE] Rain-soaked cyberpunk alley, neon signs reflected on wet cobblestones.
[SUBJECT] A lone figure in a weathered trench coat, face obscured by a wide-brim hat.
[ACTION] Walking slowly, each step splashing neon color into the puddles.
[CAMERA] Low-angle tracking shot, anamorphic lens, slow dolly in. Rack focus to face.
[STYLE] Denis Villeneuve aesthetic, high contrast, desaturated blues and magentas. 24fps.
Reference @Video1's editing style. Replace @Video1's product with @Image1 as hero.
0–3s: Product enters with dynamic rotation, close-up on surface texture and logo.
4–8s: Multiple angle transitions — front, side, back — with highlight scanning light.
9–12s: Product in lifestyle context showing usage.
13–15s: Hero shot with brand tagline, background music builds to resolution.
Sound: Reference @Video1's BGM. Add product interaction sound effects.
Scene (0–5s): Close-up on character's reddened eyes, finger pointing accusingly.
Dialogue 1: "What exactly are you trying to take from me?"
Scene (6–10s): Other character trembles, holding up evidence, steps forward.
Dialogue 2: "I'm not deceiving you! This is what he entrusted to me!"
Scene (11–15s): Evidence revealed, first character freezes — anger shifts to shock.
Sound: Urgent piano + static interference, sobbing, muffled voice blending in.
Duration: Precise 15 seconds, every frame tight, no filler.
Have the character in @Image1 replicate the dance moves and beat-synced
music from @Video1. Generate a 13-second video. Movements should be
smooth with no stuttering or freezing.
@Image1 @Image2 @Image3 @Image4 @Image5 @Image6 — landscape scene images.
Reference @Video1's visual rhythm, inter-scene transitions, visual style,
and music tempo for beat-synced editing.
[SCENE] Minimalist white studio, single product on a rotating pedestal.
[ACTION] Subtle 360° rotation, product details catching specular highlights.
[CAMERA] Tight medium shot, macro lens pass over surface texture, slow orbit.
[STYLE] Commercial grade, perfect exposure, zero background distraction.
[SCENE] Desert canyon at sunrise, sandy terrain, long shadows.
[SUBJECT] High-performance sports car accelerating through a turn.
[ACTION] Rear wheels spinning with dust plume, chassis flexing under g-force.
[CAMERA] Low hero angle dolly tracking alongside, then whip pan to lead car.
[STYLE] Hollywood racing film, warm golden grade, motion blur on wheels. 24fps.
[SUBJECT] Same fighter throughout: young woman, white gi, black belt, determined expression.
[ACTION] Fluid kata sequence — rising block, stepping side kick, spinning back fist.
[CAMERA] Full-body wide shot, then cut to close-up of fist impact in slow motion.
[STYLE] Maintain identical lighting, clothing, and facial features in every frame. Zero flicker.
Cinematic quality, film grain, shallow depth of field
2.35:1 widescreen, 24fps
Ink wash painting style / Anime style / Photorealistic
High saturation neon colors, cool-warm contrast
4K medical CGI, semi-transparent visualization
Tense and suspenseful / Warm and healing / Epic and grand
Comedy with exaggerated expressions
Documentary tone, restrained narration
Background music: grand and majestic
Sound effects: footsteps, crowd noise, car sounds
Voice tone reference @Video1
Beat-synced transitions matching music rhythm
| Model | Endpoint | Use Case |
|---|---|---|
seedance-v2.0-t2v |
Text-to-Video | Generate from Director Brief |
seedance-v2.0-i2v |
Image-to-Video | Animate 1–9 reference images + video/audio refs |
seedance-v2.0-extend |
Extend Video | Continue a v2.0 generated video |
This skill acts as a Cinematographic Wrapper that translates low-level creative intent into high-fidelity technical instructions for the muapi core.