Explainability Fidelity Metrics for Post-Hoc Model Interpretation
Keywords:
Explainable AI, Explanation Fidelity, Post-Hoc Interpretation, Causal AttributionAbstract
Post-hoc explanation methods are widely used to interpret complex machine learning models, yet the fidelity of
these explanations how accurately they reflect the model’s true reasoning remains difficult to assess. Explanations
that are easy to understand may oversimplify or distort the decision logic, while highly detailed explanations may
be accurate but unusable in practice. This study presents a structured evaluation framework for measuring
explainability fidelity through local sensitivity testing, global attribution coherence, representation-space
alignment, and causal influence validation. Experimental results show that many commonly used attribution
techniques generate persuasive but mechanistically incorrect explanations, particularly in deep models with
distributed internal representations. Methods that incorporate causal perturbation and representation-level
reasoning exhibit significantly higher fidelity. Additionally, deployment tests in cloud-integrated Oracle APEX
environments reveal that explanation stability depends on system execution context, reinforcing that fidelity is
both a modeling and operational concern. The findings provide a foundation for selecting and validating post-hoc
interpretability techniques in high-stakes enterprise applications.