Steering Video Diffusion Transformers with Massive Activations

Despite rapid progress in video diffusion transformers, how their internal model signals can be leveraged with minimal overhead to enhance video generation quality remains underexplored. In this work, we study the role of Massive Activations (MAs), which are rare, high-magnitude hidden state spikes in video diffusion transformers. We observed that MAs emerge consistently across all visual tokens, with a clear magnitude hierarchy: first-frame tokens exhibit the largest MA magnitudes, latent-frame boundary tokens (the head and tail portions of each temporal chunk in the latent space) show elevated but slightly lower MA magnitudes than the first frame, and interior tokens within each latent frame remain elevated, yet are comparatively moderate in magnitude. This structured pattern suggests that the model implicitly prioritizes token positions aligned with the temporal chunking in the latent space. Based on this observation, we propose Structured Activation Steering (STAS), a training-free self-guidance-like method that steers MA values at first-frame and boundary tokens toward a scaled global maximum reference magnitude. STAS achieves consistent improvements in terms of video quality and temporal coherence across different text-to-video models, while introducing negligible computational overhead.

Results

Wan2.1-1.3B

A pixel art astronaut, clad in a white spacesuit with blue accents and a reflective helmet, floats gracefully through the vast expanse of space...

A sophisticated couple, dressed in elegant evening attire, walks down a dimly lit street...

Pig with wings flying above a diamond mountain.

In a cozy, sunlit kitchen, a vintage chrome toaster sits on a wooden countertop...

Silk dress swaying beside a velvet curtain.

A sleek, midnight blue sedan cruises down a quiet, tree-lined suburban street...

A sleek, modern airplane, painted in a striking blue and white livery, sits on a sunlit runway...

A playful tabby cat with bright green eyes dashes across a sunlit meadow...

A majestic elephant stands in a sunlit savannah...

A majestic giraffe strolls gracefully through a sunlit savannah...

A majestic stone bridge arches gracefully over a serene river...

A plump rabbit, adorned in a flowing purple robe with golden embroidery...

In the golden light of an African savanna, a majestic giraffe with its long neck gracefully bends to nibble on the tender leaves of an acacia tree...

A sleek, modern bicycle with a matte black frame and bright red accents stands parked on a quiet, cobblestone street...

A vibrant red truck, gleaming under the midday sun, rumbles down a quiet, tree-lined suburban street...

A person with short, curly hair and wearing a cozy, oversized sweater stands in a warmly lit room...

A vibrant purple umbrella opens against a backdrop of a bustling city street...

A majestic shark glides through the swirling, vibrant waters in Van Gogh style...

A refined couple navigates a bustling street under a heavy downpour...

CogVideoX-5B

A playful panda sits on a wooden swing set in a lush bamboo forest...

A joyful Corgi with a fluffy coat frolics in a sunlit park...

A majestic chestnut horse with a flowing mane stands at the edge of a crystal-clear river...

A massive grizzly bear prowls through a dense, misty forest...

A sophisticated couple walks hand-in-hand through a bustling city street, animated in a charming hand-drawn style...

A vibrant green parrot with iridescent feathers perches on a delicate branch...

A sophisticated couple navigates through a bustling city street under a heavy downpour...

A spirited cow with a glossy brown coat gallops across a lush, green meadow...

A vibrant carousel spins under a twilight sky, its golden lights twinkling like stars...

A majestic brown bear begins its ascent up a towering pine tree in a dense forest...

A colossal iceberg drifts majestically in the frigid, azure waters of the Arctic Ocean...

A playful panda stands on a surfboard, riding gentle waves during a breathtaking sunset.

A lone rider, clad in a sleek black leather jacket, navigates a winding mountain road on a powerful motorcycle...

A sleek, vibrant orange sports car glides effortlessly along a winding coastal road...

A sleek, animated shark glides gracefully through the vibrant, turquoise waters of the ocean...

A majestic great white shark glides effortlessly through the crystal-clear, azure waters of the ocean...

A plush teddy bear, with soft brown fur and a red bow tie, sits on a lush green lawn under a bright, sunny sky...

Wan2.2-5B

A sleek, emerald-green sports car glistens under the midday sun, parked on a winding coastal road...

A majestic medieval stone tower stands tall against a vibrant sunset backdrop...

A lone bicycle glides effortlessly through a vast, snow-covered field under a pale winter sky...

A couple, elegantly dressed, navigates a bustling city street under a heavy downpour...

In a charming Parisian cafe, a panda sits at a quaint wooden table, sipping coffee...

A sleek, vintage bicycle with a leather saddle glides along a sun-dappled path...

In a magical forest, a charming koala bear sits at a grand piano...

A majestic giraffe bends down to drink from a serene river, surrounded by lush greenery...

A majestic zebra stands in the golden savannah, its stripes contrasting vividly...

In a rustic barn, a person kneels beside a gentle cow, preparing to milk...

A sleek, vibrant orange sports car glides along a winding coastal road at sunset...

A serene individual sits comfortably in a cozy, softly lit room, wearing a plush white robe...

A thrill-seeker leaps off a towering cliff, the vast canyon below stretching out...

A vintage steam locomotive chugs along a winding railway through picturesque countryside...

A bright yellow city bus, filled with weary commuters, is stuck in bumper-to-bumper traffic on a bustling urban street during rush hour...

A lone rider, clad in a sleek black leather jacket, navigates a winding mountain road on a powerful motorcycle...

A serene individual, dressed in a flowing white shirt and dark trousers, sits cross-legged on a grassy hilltop at sunset, playing a wooden flute...

Steering Video Diffusion Transformers
with Massive Activations

Abstract

Results

Wan2.1-1.3B

CogVideoX-5B

Wan2.2-5B

BibTeX

Steering Video Diffusion Transformerswith Massive Activations

Abstract

Results

Wan2.1-1.3B

CogVideoX-5B

Wan2.2-5B

BibTeX

Steering Video Diffusion Transformers
with Massive Activations