Complete Guide · 2026

Wan 3.0 Prompt Guide
Write Prompts That Actually Generate Great Video

The complete guide to structuring prompts for 4K output, AI Director mode, native audio, Identity Lock, and the @reference syntax — with copy-paste templates for every use case.

4K Native Output30-Second Single-Pass Generation6-Shot AI Director ModeUp to 12 @reference Assets Per GenerationNative Stereo Audio — No Post Sync Required
Prompt Anatomy

What Makes a Wan 3.0 Prompt Different?

Wan 3.0 prompts aren't like image prompts or ChatGPT instructions. They control an entire video production pipeline — camera, character, audio, and multi-scene structure — from a single text input. Wan 3.0 is Alibaba's next-generation AI video model that takes text, image, audio, and video as simultaneous inputs and produces a complete 4K audio-visual clip in one generation pass. Your prompt can reference up to 12 uploaded assets tagged directly using @reference syntax.

The Anatomy of a Wan 3.0 Prompt

[Scene Description + Lighting] → Camera [Movement]. [Subject + Appearance]. [Audio Tone + SFX]. @Image1 @Video1 @Audio1
SegmentRole
Scene DescriptionSets environment, lighting, time of day
Camera MovementControls framing and motion
Subject + AppearanceDescribes the active character or object
Audio Tone + SFXDescribes dialogue, ambient, score
@Reference TagsAnchors uploaded assets to specific roles
Supported input modes:Text-to-Video (T2V)Image-to-Video (I2V)AI Director Mode (multi-shot)Native Audio PromptIdentity LockRegional Editing@reference syntax
The Formula

The Wan 3.0 Prompt Formula System

Every strong Wan 3.0 prompt follows a predictable structure. Commit these three formulas to muscle memory and your output quality will jump immediately.

1

Formula 1 — Base Formula (T2V & I2V)

Template

[Shot Type] of [Subject + Appearance], [Action], [Setting + Lighting]. Camera [Movement]. [Audio description]. [Style note].

Example

Medium tracking shot of a woman in a red trench coat walking through a rain-soaked NYC street at night, neon reflections on wet pavement. Camera slowly pushes forward. Ambient traffic sounds, distant jazz. Cinematic, 35mm film grain. 4K.

2

Formula 2 — AI Director Mode (Multi-Shot)

Template

Shot 1 [0–5s]: [Shot type] — [Scene content].
Shot 2 [5–10s]: [Shot type] — [Scene content].
Shot 3 [10–18s]: [Shot type] — [Scene content].
Shot 4 [18–24s]: [Shot type] — [Scene content].
Overall tone: [audio/mood].

💡Each shot gets its own shot type and duration. Wan 3.0 handles transitions and cross-shot character consistency automatically via Identity Lock.

3

Formula 3 — Reference-Anchored Prompt

Template

[Scene description]. Character appearance: @Image1. Camera style reference: @Video1. Background music tone: @Audio1. [Any additional style notes].

💡Tag each reference by type and number — Image1 through Image9, Video1 through Video3, Audio1 through Audio3. Wan 3.0 resolves each tag to the corresponding uploaded asset.

By Mode

Wan 3.0 Prompt Writing by Mode and Feature

Each generation mode has its own prompt logic. Here's what to include, what to skip, and a ready-to-use example for each.

Text-to-Video (T2V)

Priority order: Shot type → Subject appearance → Action → Environment + Lighting → Camera movement → Audio

T2V prompts carry all the visual information — Wan 3.0 has no image to reference, so your prompt does all the heavy lifting. Include shot type first, then subject with physical details, then action, then environment with lighting.

Example Prompt

Wide shot of a man in a navy wool coat standing at a fog-covered harbor pier at sunrise, gazing outward. Camera slowly cranes upward to reveal the full skyline. Low distant foghorns, gentle wave sounds, orchestral score swells softly. Cinematic. 4K.
Templates

Copy-Paste Wan 3.0 Templates by Use Case

Real prompts for real projects. Adjust the bracketed fields and generate.

Product Commercial

T2VCommercial ✓
4K · 15–30s · 16:9
Slow-motion product reveal of a [product name] rotating on a white marble pedestal, studio lighting, subtle smoke rising. Camera orbits clockwise at 45-degree angle. Close-up on key design details. Audio: deep cinematic bass hum, subtle product sound design. Luxury brand aesthetic. 4K.

Social Media Vertical (TikTok / Reels)

T2VCommercial ✓
4K · 10s · 9:16
Vertical 9:16 frame. Close-up of [subject action] in [setting], natural light. Quick handheld energy. Trending upbeat lo-fi music, no dialogue. Text overlay space at bottom. Vibrant warm color grade. 10 seconds. 4K.

Short Film / Narrative

AI Director
4K · up to 25s
Shot 1 [0–5s]: Wide establishing — [location], [time of day].
Shot 2 [5–12s]: Medium — [character] enters frame, [action].
Shot 3 [12–20s]: Close-up — [emotional beat or object].
Shot 4 [20–25s]: Wide pullback — [resolution or cliffhanger].
Audio: sparse [genre] score, minimal dialogue.

E-Commerce Product Demo

I2V
Requires @Image1 (product photo)
Product reference: @Image1. The product is unboxed on a clean white surface, hands gently lift it into frame. Camera slowly pushes in as the product is held up for inspection. Audio: satisfying unboxing sounds — tissue paper, crisp packaging. Bright, clean studio lighting. 4K.

Corporate Brand Video

AI Director
4K · 30s
Shot 1 [0–6s]: Aerial — [company headquarters or cityscape], sunrise.
Shot 2 [6–14s]: Medium — team members collaborating around a table, natural office light.
Shot 3 [14–22s]: Close-up — hands at keyboard, focused work.
Shot 4 [22–30s]: Wide — group product launch moment, smiles.
Brand color: @Image1 (logo reference). Audio: corporate orchestral lift, no dialogue.

Multilingual Lip Sync Ad

I2V
Requires @Image1 + @Audio1 · Up to 12 languages
Character appearance: @Image1. Medium close-up — character speaks directly to camera in a modern, softly lit home studio setting. Phoneme-accurate lip sync to @Audio1. Audio: natural room tone, slight reverb. Clean background, no distractions. 4K. Suitable for 12-language multilingual ad campaign.
Quick Reference

Wan 3.0 Prompt Keyword Cheat Sheet

Copy these keywords and paste them directly into your Wan 3.0 prompt to generate better results.

Shot Types

Camera Movement

Lighting Style

Audio Description

Visual Style

@reference Syntax

Before & After

Weak Prompts vs. Strong Wan 3.0 Prompts

The difference between a generic output and a cinematic one usually comes down to five specific things.

Scene Description

❌ Weak

A woman walking in a city

✓ Strong

Medium tracking shot of a woman in a red coat walking through a rain-soaked NYC street at night, neon reflections on wet pavement. Camera slowly pushes forward. Ambient traffic sounds. Cinematic, film grain. 4K.

Product Video

❌ Weak

Product video of a shoe

✓ Strong

Slow-motion product reveal of a running shoe rotating on a white pedestal, studio lighting, white background. Camera orbits at 45-degree angle. Deep bass design sound. Luxury aesthetic. 4K.

Location Scene

❌ Weak

A coffee shop scene

✓ Strong

Wide shot of a cozy independent coffee shop interior at morning, warm amber lighting, steam rising from cups, soft rain on windows. Static camera. Background chatter, jazz piano, espresso machine hiss (diegetic). Documentary feel. 4K.

Multi-Shot

❌ Weak

Multi-shot ad video

✓ Strong

Shot 1 [0–5s]: Extreme close-up — product on ice, droplets. Shot 2 [5–12s]: Medium — person lifts product, refreshed. Shot 3 [12–20s]: Wide — friends outdoors at golden hour. Shot 4 [20–25s]: Close pour shot. Audio: upbeat ambient, crowd energy.

Lip Sync

❌ Weak

Woman speaking to camera

✓ Strong

Character appearance: @Image1. Medium close-up — character speaks directly to camera in a softly lit home studio. Phoneme-accurate lip sync to @Audio1. Natural room tone, minimal reverb. Clean background. 4K.

Competitive Context

Wan 3.0 vs. Sora, Kling 3.0, and Seedance 2.0

Knowing where Wan 3.0 leads helps you write prompts that take full advantage of its unique capabilities.

FeatureWan 3.0Sora 2Kling 3.0Seedance 2.0
Max Resolution4K Native1080P4K2K
Max Duration30 seconds25 sec15 sec15 sec
Native Audio
Multi-Shot Director6 shots6 shots
Reference Inputs12 assetsLimitedVideo ref12 assets
Identity Lock✓ Cross-session
Lip Sync PrecisionPhoneme-levelGoodPhoneme-level
Multilingual Text12 languagesLimitedLimited8 languages
Video Continuation
Commercial License✓ Included

Bottom line: Wan 3.0 is the strongest choice for production teams that need narrative length, multilingual output, and brand-accurate control across a full campaign — not just isolated high-quality clips.

FAQ

FAQ — Wan 3.0 Prompt Guide

You've Got the Guide. Now Generate.

Open Wan 3.0, paste a template from this guide, and produce your first 4K clip with synchronized audio in under two minutes.