The comparison of classification accuracy statements has generally been based upon tests of difference or inequality when other scenarios and approaches may be more appropriate. Procedures for evaluating two scenarios with interest focused on the similarity in accuracy values, non-inferiority and equivalence, are outlined following a discussion of tests of difference (inequality). It is also suggested that the confidence interval of the difference in classification accuracy may be used as well as or instead of conventional hypothesis testing to reveal more information about the disparity in the classification accuracy values compared.
Foody, G. M. Classification accuracy comparison: hypothesis tests and the use of confidence intervals in evaluations of difference, equivalence and non-inferiority. Remote Sensing of Environment, 113(8), https://doi.org/10.1016/j.rse.2009.03.014