Neural Compression Limits in Parameter-Efficient AI  Foundation Models

Rowan Hayes, Marissa Carver

Authors

Rowan Hayes, Marissa Carver

Keywords:

Neural Compression, Representation Stability, Parameter-Efficient Models

Abstract

Parameter-efficient neural compression techniques are increasingly used to reduce the computational
cost of deploying foundation models in large-scale, real-time inference environments. However,
compression modifies internal representational structures, influencing semantic coherence, reasoning
continuity, and long-term stability. This study evaluates structured pruning, low-rank factorization,
quantization, and sparse expert routing approaches to identify the boundary conditions under which
compression preserves versus degrades latent representation geometry. Results show that low-rank
approximation maintains stable semantic structure and inference robustness, while aggressive pruning
and low-precision quantization introduce cumulative representational drift over time. The findings
highlight that compression must be managed with awareness of representational topology, temporal
workload patterns, and operational reliability requirements, particularly when models are integrated
into interactive or regulated enterprise systems.

Neural Compression Limits in Parameter-Efficient AI Foundation Models

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section