ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
🫖 Introduction
In this work, we present ERTACache, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. The following figure gives an overview of our ERTACache framework, which adopts a dual-dimensional correction strategy: (1) we first perform offline policy calibration by searching for a globally effective cache schedule using residual error profiling; (2) we then introduce a trajectory-aware timestep adjustment mechanism to mitigate integration drift caused by reused features; (3) finally, we propose an explicit error rectification that analytically approximates and rectifies the additive error introduced by cached outputs, enabling accurate reconstruction with negligible overhead.
As the figure shown below, ERTACache preserves fine-grained visual details and frame-to-frame consistency, outperforming TeaCache and matching the performance of the non-cache reference. In video generation tasks using CogVideoX, Wan2.1-1.3B, and OperaSora 1.2, ERTA-Cache achieves noticeably better temporal consistency, particularly between the first and last frames. When applied to the Flux-dev 1.0 image model, it enhances visual richness and details. These results highlight ERTACache as a uniquely effective solution that balances visual quality and computational efficiency for consistent video generation.
Unlike prior heuristics-based methods, ERTACache provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling.
🎉 Supported Models
Text to Video
ERTACache4Wan2.1
ERTACache4CogVideoX-2B
ERTACache4OpenSora1.2
Text to Image
ERTACache4FLUX
📈 Inference Comparisons on a Single A800
T2V Model
Method
LPIPS
SSIM
PSNR
Latency(s)
OpenSora 1.2
TeaCache
0.2511
0.7477
19.10
19.84
ERTACache
0.1659
0.8170
22.34
18.04
CogvideoX-2B
TeaCache
0.2057
0.7614
20.97
26.88
ERTACache
0.1012
0.8702
26.44
26.78
Wan2.1-1.3B
TeaCache
0.2913
0.5685
16.17
99.5
ERTACache
0.1095
0.8200
23.77
91.7
FLUX-dev 1.0
TeaCache
0.4427
0.7445
16.47
14.21
ERTACache
0.3029
0.8962
20.51
14.01
Installation
The running environment set-up depends on the specific model. For example, for FLUX, you need to install the FLUX packages:
For all the supported models, you can enter in the specific folder (for example: go to \ERTACache4FLUX ), then use the following command to get the outputs saved in the .\sample folder
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
🫖 Introduction
In this work, we present ERTACache, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. The following figure gives an overview of our ERTACache framework, which adopts a dual-dimensional correction strategy: (1) we first perform offline policy calibration by searching for a globally effective cache schedule using residual error profiling; (2) we then introduce a trajectory-aware timestep adjustment mechanism to mitigate integration drift caused by reused features; (3) finally, we propose an explicit error rectification that analytically approximates and rectifies the additive error introduced by cached outputs, enabling accurate reconstruction with negligible overhead.
As the figure shown below, ERTACache preserves fine-grained visual details and frame-to-frame consistency, outperforming TeaCache and matching the performance of the non-cache reference. In video generation tasks using CogVideoX, Wan2.1-1.3B, and OperaSora 1.2, ERTA-Cache achieves noticeably better temporal consistency, particularly between the first and last frames. When applied to the Flux-dev 1.0 image model, it enhances visual richness and details. These results highlight ERTACache as a uniquely effective solution that balances visual quality and computational efficiency for consistent video generation.
Unlike prior heuristics-based methods, ERTACache provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling.
🎉 Supported Models
Text to Video
Text to Image
📈 Inference Comparisons on a Single A800
Installation
The running environment set-up depends on the specific model. For example, for FLUX, you need to install the FLUX packages:
Usage
For all the supported models, you can enter in the specific folder (for example: go to
\ERTACache4FLUX), then use the following command to get the outputs saved in the.\samplefolder💐 Acknowledgement
This repository is built based on VideoSys, Diffusers, Open-Sora, CogVideoX, FLUX, Wan2.1, Thanks for their contributions!
🔒 License