Wan 3.0 Prompt Guide
Write Prompts That Actually Generate Great Video
The complete guide to structuring prompts for 4K output, AI Director mode, native audio, Identity Lock, and the @reference syntax — with copy-paste templates for every use case.
What Makes a Wan 3.0 Prompt Different?
Wan 3.0 prompts aren't like image prompts or ChatGPT instructions. They control an entire video production pipeline — camera, character, audio, and multi-scene structure — from a single text input. Wan 3.0 is Alibaba's next-generation AI video model that takes text, image, audio, and video as simultaneous inputs and produces a complete 4K audio-visual clip in one generation pass. Your prompt can reference up to 12 uploaded assets tagged directly using @reference syntax.
The Anatomy of a Wan 3.0 Prompt
[Scene Description + Lighting] → Camera [Movement]. → [Subject + Appearance]. → [Audio Tone + SFX]. → @Image1 @Video1 @Audio1| Segment | Role | Example |
|---|---|---|
| Scene Description | Sets environment, lighting, time of day | rain-soaked NYC street at night, neon reflections |
| Camera Movement | Controls framing and motion | Camera slowly pushes forward |
| Subject + Appearance | Describes the active character or object | woman in a red trench coat, shoulder-length dark hair |
| Audio Tone + SFX | Describes dialogue, ambient, score | Ambient traffic sounds, distant jazz |
| @Reference Tags | Anchors uploaded assets to specific roles | @Image1 @Video1 @Audio1 |
The Wan 3.0 Prompt Formula System
Every strong Wan 3.0 prompt follows a predictable structure. Commit these three formulas to muscle memory and your output quality will jump immediately.
Formula 1 — Base Formula (T2V & I2V)
Template
[Shot Type] of [Subject + Appearance], [Action], [Setting + Lighting]. Camera [Movement]. [Audio description]. [Style note].Example
“Medium tracking shot of a woman in a red trench coat walking through a rain-soaked NYC street at night, neon reflections on wet pavement. Camera slowly pushes forward. Ambient traffic sounds, distant jazz. Cinematic, 35mm film grain. 4K.”
Formula 2 — AI Director Mode (Multi-Shot)
Template
Shot 1 [0–5s]: [Shot type] — [Scene content].
Shot 2 [5–10s]: [Shot type] — [Scene content].
Shot 3 [10–18s]: [Shot type] — [Scene content].
Shot 4 [18–24s]: [Shot type] — [Scene content].
Overall tone: [audio/mood].💡Each shot gets its own shot type and duration. Wan 3.0 handles transitions and cross-shot character consistency automatically via Identity Lock.
Formula 3 — Reference-Anchored Prompt
Template
[Scene description]. Character appearance: @Image1. Camera style reference: @Video1. Background music tone: @Audio1. [Any additional style notes].💡Tag each reference by type and number — Image1 through Image9, Video1 through Video3, Audio1 through Audio3. Wan 3.0 resolves each tag to the corresponding uploaded asset.
Wan 3.0 Prompt Writing by Mode and Feature
Each generation mode has its own prompt logic. Here's what to include, what to skip, and a ready-to-use example for each.
Text-to-Video (T2V)
Priority order: Shot type → Subject appearance → Action → Environment + Lighting → Camera movement → Audio
T2V prompts carry all the visual information — Wan 3.0 has no image to reference, so your prompt does all the heavy lifting. Include shot type first, then subject with physical details, then action, then environment with lighting.
Example Prompt
Wide shot of a man in a navy wool coat standing at a fog-covered harbor pier at sunrise, gazing outward. Camera slowly cranes upward to reveal the full skyline. Low distant foghorns, gentle wave sounds, orchestral score swells softly. Cinematic. 4K.Copy-Paste Wan 3.0 Templates by Use Case
Real prompts for real projects. Adjust the bracketed fields and generate.
Product Commercial
Slow-motion product reveal of a [product name] rotating on a white marble pedestal, studio lighting, subtle smoke rising. Camera orbits clockwise at 45-degree angle. Close-up on key design details. Audio: deep cinematic bass hum, subtle product sound design. Luxury brand aesthetic. 4K.Social Media Vertical (TikTok / Reels)
Vertical 9:16 frame. Close-up of [subject action] in [setting], natural light. Quick handheld energy. Trending upbeat lo-fi music, no dialogue. Text overlay space at bottom. Vibrant warm color grade. 10 seconds. 4K.Short Film / Narrative
Shot 1 [0–5s]: Wide establishing — [location], [time of day].
Shot 2 [5–12s]: Medium — [character] enters frame, [action].
Shot 3 [12–20s]: Close-up — [emotional beat or object].
Shot 4 [20–25s]: Wide pullback — [resolution or cliffhanger].
Audio: sparse [genre] score, minimal dialogue.E-Commerce Product Demo
Product reference: @Image1. The product is unboxed on a clean white surface, hands gently lift it into frame. Camera slowly pushes in as the product is held up for inspection. Audio: satisfying unboxing sounds — tissue paper, crisp packaging. Bright, clean studio lighting. 4K.Corporate Brand Video
Shot 1 [0–6s]: Aerial — [company headquarters or cityscape], sunrise.
Shot 2 [6–14s]: Medium — team members collaborating around a table, natural office light.
Shot 3 [14–22s]: Close-up — hands at keyboard, focused work.
Shot 4 [22–30s]: Wide — group product launch moment, smiles.
Brand color: @Image1 (logo reference). Audio: corporate orchestral lift, no dialogue.Multilingual Lip Sync Ad
Character appearance: @Image1. Medium close-up — character speaks directly to camera in a modern, softly lit home studio setting. Phoneme-accurate lip sync to @Audio1. Audio: natural room tone, slight reverb. Clean background, no distractions. 4K. Suitable for 12-language multilingual ad campaign.Wan 3.0 Prompt Keyword Cheat Sheet
Copy these keywords and paste them directly into your Wan 3.0 prompt to generate better results.
Shot Types
Camera Movement
Lighting Style
Audio Description
Visual Style
@reference Syntax
Weak Prompts vs. Strong Wan 3.0 Prompts
The difference between a generic output and a cinematic one usually comes down to five specific things.
Scene Description
A woman walking in a city
Medium tracking shot of a woman in a red coat walking through a rain-soaked NYC street at night, neon reflections on wet pavement. Camera slowly pushes forward. Ambient traffic sounds. Cinematic, film grain. 4K.
Product Video
Product video of a shoe
Slow-motion product reveal of a running shoe rotating on a white pedestal, studio lighting, white background. Camera orbits at 45-degree angle. Deep bass design sound. Luxury aesthetic. 4K.
Location Scene
A coffee shop scene
Wide shot of a cozy independent coffee shop interior at morning, warm amber lighting, steam rising from cups, soft rain on windows. Static camera. Background chatter, jazz piano, espresso machine hiss (diegetic). Documentary feel. 4K.
Multi-Shot
Multi-shot ad video
Shot 1 [0–5s]: Extreme close-up — product on ice, droplets. Shot 2 [5–12s]: Medium — person lifts product, refreshed. Shot 3 [12–20s]: Wide — friends outdoors at golden hour. Shot 4 [20–25s]: Close pour shot. Audio: upbeat ambient, crowd energy.
Lip Sync
Woman speaking to camera
Character appearance: @Image1. Medium close-up — character speaks directly to camera in a softly lit home studio. Phoneme-accurate lip sync to @Audio1. Natural room tone, minimal reverb. Clean background. 4K.
Wan 3.0 vs. Sora, Kling 3.0, and Seedance 2.0
Knowing where Wan 3.0 leads helps you write prompts that take full advantage of its unique capabilities.
| Feature | Wan 3.0 | Sora 2 | Kling 3.0 | Seedance 2.0 |
|---|---|---|---|---|
| Max Resolution | 4K Native | 1080P | 4K | 2K |
| Max Duration | 30 seconds | 25 sec | 15 sec | 15 sec |
| Native Audio | ✓ | — | — | — |
| Multi-Shot Director | 6 shots | — | 6 shots | — |
| Reference Inputs | 12 assets | Limited | Video ref | 12 assets |
| Identity Lock | ✓ Cross-session | — | — | — |
| Lip Sync Precision | Phoneme-level | — | Good | Phoneme-level |
| Multilingual Text | 12 languages | Limited | Limited | 8 languages |
| Video Continuation | ✓ | — | — | — |
| Commercial License | ✓ Included | ✓ | ✓ | ✓ |
Bottom line: Wan 3.0 is the strongest choice for production teams that need narrative length, multilingual output, and brand-accurate control across a full campaign — not just isolated high-quality clips.
FAQ — Wan 3.0 Prompt Guide
You've Got the Guide. Now Generate.
Open Wan 3.0, paste a template from this guide, and produce your first 4K clip with synchronized audio in under two minutes.