Mixed-Precision Training Efficiency in Compute Constrained ML Systems
Keywords:
Mixed-Precision Training, Gradient Stability, Generalization RobustnessAbstract
Mixed-precision training has become a practical approach for accelerating deep neural network
training in compute-constrained environments, but its effectiveness depends on maintaining gradient
fidelity and stable convergence behavior. By executing forward and backward passes in reduced
precision while retaining master parameters in higher precision, mixed-precision techniques reduce
memory usage and improve arithmetic throughput. However, precision reduction introduces
quantization noise and increases the risk of gradient underflow, making loss scaling and selective
precision control essential. This study evaluates mixed-precision training across multiple neural
architectures, examining gradient stability, convergence trajectories, and generalization performance
relative to full-precision training. The results show that when dynamic scaling and controlled
precision retention are applied, mixed-precision models achieve comparable or improved
generalization by converging toward flatter minima, while significantly increasing training efficiency.
These findings demonstrate that mixed-precision training is not merely an optimization for hardware
utilization, but a convergence-shaping strategy that influences training dynamics and model
robustness.