Despite rapid progress in video diffusion transformers, how their internal model signals can be leveraged with
minimal overhead to enhance video generation quality remains underexplored. In this work, we study the role of
Massive Activations (MAs), which are rare, high-magnitude hidden state spikes in video diffusion
transformers. We observed that MAs emerge consistently across all visual tokens, with a clear magnitude hierarchy:
first-frame tokens exhibit the largest MA magnitudes, latent-frame boundary tokens (the head and tail portions of
each temporal chunk in the latent space) show elevated but slightly lower MA magnitudes than the first frame, and
interior tokens within each latent frame remain elevated, yet are comparatively moderate in magnitude. This
structured pattern suggests that the model implicitly prioritizes token positions aligned with the temporal
chunking in the latent space. Based on this observation, we propose Structured
Activation Steering
(STAS), a training-free self-guidance-like method that steers MA values at first-frame and
boundary
tokens toward a scaled global maximum reference magnitude. STAS achieves consistent improvements in terms of video
quality and temporal coherence across different text-to-video models, while introducing negligible computational
overhead.
Results
Side-by-side video comparisons: Baseline (left) vs. +Ours
(right).
Wan2.1-1.3B
A pixel art astronaut, clad in a white spacesuit with blue accents and a reflective
helmet, floats gracefully through the vast expanse of space...
A sophisticated couple, dressed in elegant evening attire, walks down a dimly lit
street...
Pig with wings flying above a diamond mountain.
In a cozy, sunlit kitchen, a vintage chrome toaster sits on a wooden countertop...
Silk dress swaying beside a velvet curtain.
A sleek, midnight blue sedan cruises down a quiet, tree-lined suburban street...
A sleek, modern airplane, painted in a striking blue and white livery, sits on a sunlit
runway...
A playful tabby cat with bright green eyes dashes across a sunlit meadow...
A majestic elephant stands in a sunlit savannah...
A majestic giraffe strolls gracefully through a sunlit savannah...
A majestic stone bridge arches gracefully over a serene river...
A plump rabbit, adorned in a flowing purple robe with golden embroidery...
In the golden light of an African savanna, a majestic giraffe with its long neck
gracefully bends to nibble on the tender leaves of an acacia tree...
A sleek, modern bicycle with a matte black frame and bright red accents stands parked
on a quiet, cobblestone street...
A vibrant red truck, gleaming under the midday sun, rumbles down a quiet, tree-lined
suburban street...
A person with short, curly hair and wearing a cozy, oversized sweater stands in a
warmly lit room...
A vibrant purple umbrella opens against a backdrop of a bustling city street...
A majestic shark glides through the swirling, vibrant waters in Van Gogh style...
A refined couple navigates a bustling street under a heavy downpour...
CogVideoX-5B
A playful panda sits on a wooden swing set in a lush bamboo forest...
A joyful Corgi with a fluffy coat frolics in a sunlit park...
A majestic chestnut horse with a flowing mane stands at the edge of a crystal-clear
river...
A massive grizzly bear prowls through a dense, misty forest...
A sophisticated couple walks hand-in-hand through a bustling city street, animated in a
charming hand-drawn style...
A vibrant green parrot with iridescent feathers perches on a delicate branch...
A sophisticated couple navigates through a bustling city street under a heavy
downpour...
A spirited cow with a glossy brown coat gallops across a lush, green meadow...
A vibrant carousel spins under a twilight sky, its golden lights twinkling like
stars...
A majestic brown bear begins its ascent up a towering pine tree in a dense forest...
A colossal iceberg drifts majestically in the frigid, azure waters of the Arctic
Ocean...
A playful panda stands on a surfboard, riding gentle waves during a breathtaking
sunset.
A lone rider, clad in a sleek black leather jacket, navigates a winding mountain road
on a powerful motorcycle...
A sleek, vibrant orange sports car glides effortlessly along a winding coastal road...
A sleek, animated shark glides gracefully through the vibrant, turquoise waters of the
ocean...
A majestic great white shark glides effortlessly through the crystal-clear, azure
waters of the ocean...
A plush teddy bear, with soft brown fur and a red bow tie, sits on a lush green lawn
under a bright, sunny sky...
Wan2.2-5B
A sleek, emerald-green sports car glistens under the midday sun, parked on a winding
coastal road...
A majestic medieval stone tower stands tall against a vibrant sunset backdrop...
A lone bicycle glides effortlessly through a vast, snow-covered field under a pale
winter sky...
A couple, elegantly dressed, navigates a bustling city street under a heavy downpour...
In a charming Parisian cafe, a panda sits at a quaint wooden table, sipping coffee...
A sleek, vintage bicycle with a leather saddle glides along a sun-dappled path...
In a magical forest, a charming koala bear sits at a grand piano...
A majestic giraffe bends down to drink from a serene river, surrounded by lush
greenery...
A majestic zebra stands in the golden savannah, its stripes contrasting vividly...
In a rustic barn, a person kneels beside a gentle cow, preparing to milk...
A sleek, vibrant orange sports car glides along a winding coastal road at sunset...
A serene individual sits comfortably in a cozy, softly lit room, wearing a plush white
robe...
A thrill-seeker leaps off a towering cliff, the vast canyon below stretching out...
A vintage steam locomotive chugs along a winding railway through picturesque
countryside...
A bright yellow city bus, filled with weary commuters, is stuck in bumper-to-bumper
traffic on a bustling urban street during rush hour...
A lone rider, clad in a sleek black leather jacket, navigates a winding mountain road
on a powerful motorcycle...
A serene individual, dressed in a flowing white shirt and dark trousers, sits
cross-legged on a grassy hilltop at sunset, playing a wooden flute...
BibTeX
Please consider citing our paper if you find it useful in your research.
@article{cheng2026stas,
title={Steering Video Diffusion Transformers with Massive Activations},
author={Cheng, Xianhang and Zheng, Yujian and Xie, Zhenyu and Liao, Tingting and Li, Hao},
journal={arXiv preprint arXiv:2603.17825},
year={2026}
}