Measuring Explanation Faithfulness in Post-Hoc Model Interpretation Frameworks
Keywords:
Explainable AI, Explanation Fidelity, Post-Hoc Interpretation, Causal AttributionAbstract
Post-hoc explanation methods are widely used to interpret complex machine learning models, yet the fidelity of these explanation show accurately they reflect the model’s true reason in gremains difficult to assess. Explanations that are easy to understand may oversimplify or distort the decision logic, while highly detailed explanations may be accurate but unusable in practice. This study presents a structured evaluation framework for measuring explainability fidelity through local sensitivity testing, global attribution coherence, representation-space alignment, and causal influence validation. Experimental results show that many commonly used attribution techniques generate persuasive but mechanistically incorrect explanations, particularly in deep models with distributed internal representations. Methods that incorporate causal perturbation and representation-level reasoning exhibit significantly higher fidelity. Additionally, deployment tests in cloud-integrated Oracle APEX environments reveal that explanation stability depends on system execution context, reinforcing that fidelity is both a modeling and operational concern. The findings provide a foundation for selecting and validating post-hoc interpretability techniques in high-stakes enterprise applications.