These innovations are designed to achieve a 73% reduction in per-token inference FLOPs and a 90% reduction in KV cache memory burden compared with DeepSeek-V3.2.
This highlights the significant performance improvements in the V4 architecture over its predecessor, which is crucial for understanding the benefits of upgrading.