A comparative study of estimation methods in quantum tomography

As quantum tomography is becoming a key component of the quantum engineering toolbox, there is a need for a deeper understanding of the multitude of estimation methods available. Here we investigate and compare several such methods: maximum likelihood, least squares, generalised least squares, positive least squares, thresholded least squares and projected least squares. The common thread of the analysis is that each estimator projects the measurement data onto a parameter space with respect to a specific metric, thus allowing us to study the relationships between different estimators. The asymptotic behaviour of the least squares and the projected least squares estimators is studied in detail for the case of the covariant measurement and a family of states of varying ranks. This gives insight into the rank-dependent risk reduction for the projected estimator, and uncovers an interesting non-monotonic behaviour of the Bures risk. These asymptotic results complement recent non-asymptotic concentration bounds of \cite{GutaKahnKungTropp} which point to strong optimality properties, and high computational efficiency of the projected linear estimators. To illustrate the theoretical methods we present results of an extensive simulation study. An app running the different estimators has been made available online.

More recently, quantum tomography has become a crucial validation tool current quantum technology applications [38,58,29,67]. The experimental challenges have stimulated research in new directions such as compressed sensing [35,54,27,70,20,5,3], estimation of permutationally invariant states [73], adaptive and selflearning tomography [55,34,60,26,39,63], incomplete tomography [74], minimax bounds [25,4], Bayesian estimation [15,33], and confidence regions [7,18,71,24,53]. Since 'full tomography' becomes impossible for systems composed of even a 1 arXiv:1901.07991v1 [quant-ph] 23 Jan 2019 f < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 j z U X Q l i 6 6 y S g D b 8 y D n C 2 L x + t k s = " > A A A B + X i c b V C 7 T s M w F L 0 p r 1 J e A U Y W i x a J q U q 6 A F s l F s Y i 0 Y f U R p X j O q 1 V x 4 5 s p 1 I V 9 U 9 Y G E C I l T 9 h 4 2 9 w 2 g z Q c i T L R + f c K x + f M O F M G 8 / 7 d k p b 2 z u 7 e + X 9 y s H h 0 f G J e 3 r W 0 T J V h L a J 5 F L 1 Q q w p Z 4 K 2 D T O c 9 h J F c R x y 2 g 2 n 9 7 n f n V G l m R R P Z p 7 Q I M Z j w S J G s L H S 0 H V r g 1 D y k Z 7 H 9 s q i R W 3 o V r 2 6 t w T a J H 5 B q l C g N X S / B i N J 0 p g K Q z j W u u 9 7 i Q k y r A w j n C 4 q g 1 T T B J M p H t O + p Q L H V A f Z M v k C X V l l h C K p 7 B E G L d X f G x m O d Z 7 N T s b Y T P S 6 l 4 v / e f 3 U R L d B x k S S G i r I 6 q E o 5 c h I l N e A R k x R Y v j c E k w U s 1 k R m W C F i b F l V W w J / v q X N 0 m n U f e 9 u v / Y q D b v i j r K c A G X c A 0 + 3 E A T H q A F b S A w g 2 d 4 h T c n c 1 6 c d + d j N V p y i p 1 z + A P n 8 w d r q 5 N 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 j z U X Q l i 6 6 y S g D b 8 y D n C 2 L x + t k s = " > A A A B + X i c b V C 7 T s M w F L 0 p r 1 J e A U Y W i x a J q U q 6 A F s l F s Y i 0 Y f U R p X j O q 1 V x 4 5 s p 1 I V 9 U 9 Y G E C I l T 9 h 4 2 9 w 2 g z Q c i T L R + f c K x + f M O F M G 8 / 7 d k p b 2 z u 7 e + X 9 y s H h 0 f G J e 3 r W 0 T J V h L a J 5 F L 1 Q q w p Z 4 K 2 D T O c 9 h J F c R x y 2 g 2 n 9 7 n f n V G l m R R P Z p 7 Q I M Z j w S J G s L H S 0 H V r g 1 D y k Z 7 H 9 s q i R W 3 o V r 2 6 t w T a J H 5 B q l C g N X S / B i N J 0 p g K Q z j W u u 9 7 i Q k y r A w j n C 4 q g 1 T T B J M p H t O + p Q L H V A f Z M v k C X V l l h C K p 7 B E G L d X f G x m O d Z 7 N T s b Y T P S 6 l 4 v / e f 3 U R L d B x k S S G i r I 6 q E o 5 c h I l N e A R k x R Y v j c E k w U s 1 k R m W C F i b F l V W w J / v q X N 0 m n U f e 9 u v / Y q D b v i j r K c A G X c A 0 + 3 E A T H q A F b S A w g 2 d 4 h T c n c 1 6 c d + d j N V p y i p 1 z + A P n 8 w d r q 5 N 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 j z U X Q l i 6 6 y S g D b 8 y D n C 2 L x + t k s = " > A A A B + X i c b V C 7 T s M w F L 0 p r 1 J e A U Y W i x a J q U q 6 A F s l F s Y i 0 Y f U R p X j O q 1 V x 4 5 s p 1 I V 9 U 9 Y G E C I l T 9 h 4 2 9 w 2 g z Q c i T L R + f c K x + f M O F M G 8 / 7 d k p b 2 z u 7 e + X 9 y s H h 0 f G J e 3 r W 0 T J V h L a J 5 F L 1 Q q w p Z 4 K 2 D T O c 9 h J F c R x y 2 g 2 n 9 7 n f n V G l m R R P Z p 7 Q I M Z j w S J G s L H S 0 H V r g 1 D y k Z 7 H 9 s q i R W 3 o V r 2 6 t w T a J H 5 B q l C g N X S / B i N J 0 p g K Q z j W u u 9 7 i Q k y r A w j n C 4 q g 1 T T B J M p H t O + p Q L H V A f Z M v k C X V l l h C K p 7 B E G L d X f G x m O d Z 7 N T s b Y T P S 6 l 4 v / e f 3 U R L d B x k S S G i r I 6 q E o 5 c h I l N e A R k x R Y v j c E k w U s 1 k R m W C F i b F l V W w J / v q X N 0 m n U f e 9 u v / Y q D b v i j r K c A G X c A 0 + 3 E A T H q A F b S A w g 2 d 4 h T c n c 1 6 c d + d j N V p y i p 1 z + A P n 8 w d r q 5 N 5 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 j z U X Q l i 6 6 y S g D b 8 y D n C 2 L x + t k s = " > A A A B + X i c b V C 7 T s M w F L 0 p r 1 J e A U Y W i x a J q U q 6 A F s l F s Y i 0 Y f U R p X j O q 1 V x 4 5 s p 1 I V 9 U 9 Y G E C I l T 9 h 4 2 9 w 2 g z Q c i T L R + f c K x + f M O F M G 8 / 7 d k p b 2 z u 7 e + X 9 y s H h 0 f G J e 3 r W 0 T J V h L a J 5 F L 1 Q q w p Z 4 K 2 D T O c 9 h J F c R x y 2 g 2 n 9 7 n f n V G l m R R P Z p 7 Q I M Z j w S J G s L H S 0 H V r g 1 D y k Z 7 H 9 s q i R W 3 o V r 2 6 t w T a J H 5 B q l C g N X S / B i N J 0 p g K Q z j W u u 9 7 i Q k y r A w j n C 4 q g 1 T T B J M p H t O + p Q L H V A f Z M v k C X V l l h C K p 7 B E G L d X f G x m O d Z 7 N T s b Y T P S 6 l 4 v / e f 3 U R L d B x k S S G i r I 6 q E o 5 c h I l N e A R k x R Y v j c E k w U s 1 k R m W C F i b F l V W w J / v q X N 0 m n U f e 9 u v / Y q D b v i j r K c A G X c A 0 + 3 E A T H q A F b S A w g 2 d 4 h T c n c 1 6 c d + d j N V p y i p 1 z + A P n 8 w d r q 5 N 5 < / l a t e x i t > M < l a t e x i t s h a 1 _ b a s e 6 4 = " J 6 n 4 o p e r q M k p r f V + 1 S 5 1 Y H s L Y 7 M = " > A A A B 9 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B F v B V Z n p R t 0 V 3 L g R K t g H t G P J p J k 2 N J M M S U Y p Q / / D j Q t F 3 P o v 7 v w b M + 0 s t P V A 4 H D O v d y T E 8 S c a e O 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e t b V M F K E t I r l U 3 Q B r y p m g L c M M p 9 1 Y U R w F n H a C y X X m d x 6 p 0 k y K e z O N q R / h k W A h I 9 h Y 6 a G K + h E 2 Y 4 J 5 e j u r D s o V t + b O g V a J l 5 M K 5 G g O y l / 9 o S R J R I U h H G v d 8 9 z Y + C l W h h F O Z 6 V + o m m M y Q S P a M 9 S g S O q / X S e e o b O r D J E o V T 2 C Y P m 6 u + N F E d a T 6 P A T m Y Z 9 b K X i f 9 5 v c S E l 3 7 K R J w Y K s j i U J h w Z C T K K k B D p i g x f G o J J o r Z r I i M s c L E 2 K J K t g R v + c u r p F 2 v e W 7 N u 6 t X G l d 5 H U U 4 g V M 4 B w 8 u o A E 3 0 I Q W E F D w D K / w 5 j w 5 L 8 6 7 8 7 E Y L T j 5 z j H 8 g f P 5 A 5 O h k d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 6 n 4 o p e r q M k p r f V + 1 S 5 1 Y H s L Y 7 M = " > A A A B 9 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B F v B V Z n p R t 0 V 3 L g R K t g H t G P J p J k 2 N J M M S U Y p Q / / D j Q t F 3 P o v 7 v w b M + 0 s t P V A 4 H D O v d y T E 8 S c a e O 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e t b V M F K E t I r l U 3 Q B r y p m g L c M M p 9 1 Y U R w F n H a C y X X m d x 6 p 0 k y K e z O N q R / h k W A h I 9 h Y 6 a G K + h E 2 Y 4 J 5 e j u r D s o V t + b O g V a J l 5 M K 5 G g O y l / 9 o S R J R I U h H G v d 8 9 z Y + C l W h h F O Z 6 V + o m m M y Q S P a M 9 S g S O q / X S e e o b O r D J E o V T 2 C Y P m 6 u + N F E d a T 6 P A T m Y Z 9 b K X i f 9 5 v c S E l 3 7 K R J w Y K s j i U J h w Z C T K K k B D p i g x f G o J J o r Z r I i M s c L E 2 K J K t g R v + c u r p F 2 v e W 7 N u 6 t X G l d 5 H U U 4 g V M 4 B w 8 u o A E 3 0 I Q W E F D w D K / w 5 j w 5 L 8 6 7 8 7 E Y L T j 5 z j H 8 g f P 5 A 5 O h k d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 6 n 4 o p e r q M k p r f V + 1 S 5 1 Y H s L Y 7 M = " > A A A B 9 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B F v B V Z n p R t 0 V 3 L g R K t g H t G P J p J k 2 N J M M S U Y p Q / / D j Q t F 3 P o v 7 v w b M + 0 s t P V A 4 H D O v d y T E 8 S c a e O 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e t b V M F K E t I r l U 3 Q B r y p m g L c M M p 9 1 Y U R w F n H a C y X X m d x 6 p 0 k y K e z O N q R / h k W A h I 9 h Y 6 a G K + h E 2 Y 4 J 5 e j u r D s o V t + b O g V a J l 5 M K 5 G g O y l / 9 o S R J R I U h H G v d 8 9 z Y + C l W h h F O Z 6 V + o m m M y Q S P a M 9 S g S O q / X S e e o b O r D J E o V T 2 C Y P m 6 u + N F E d a T 6 P A T m Y Z 9 b K X i f 9 5 v c S E l 3 7 K R J w Y K s j i U J h w Z C T K K k B D p i g x f G o J J o r Z r I i M s c L E 2 K J K t g R v + c u r p F 2 v e W 7 N u 6 t X G l d 5 H U U 4 g V M 4 B w 8 u o A E 3 0 I Q W E F D w D K / w 5 j w 5 L 8 6 7 8 7 E Y L T j 5 z j H 8 g f P 5 A 5 O h k d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J 6 n 4 o p e r q M k p r f V + 1 S 5 1 Y H s L Y 7 M = " > A A A B 9 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B F v B V Z n p R t 0 V 3 L g R K t g H t G P J p J k 2 N J M M S U Y p Q / / D j Q t F 3 P o v 7 v w b M + 0 s t P V A 4 H D O v d y T E 8 S c a e O 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e t b V M F K E t I r l U 3 Q B r y p m g L c M M p 9 1 Y U R w F n H a C y X X m d x 6 p 0 k y K e z O N q R / h k W A h I 9 h Y 6 a G K + h E 2 Y 4 J 5 e j u r D s o V t + b O g V a J l 5 M K 5 G g O y l / 9 o S R J R I U h H G v d 8 9 z Y + C l W h h F O Z 6 V + o m m M y Q S P a M 9 S g S O q / X S e e o b O r D J E o V T 2 C Y P m 6 u + N F E d a T 6 P A T m Y Z 9 b K X i f 9 5 v c S E l 3 7 K R J w Y K s j i U J h w Z C T K K k B D p i g x f G o J J o r Z r I i M s c L E 2 K J K t g R v + c u r p F 2 v e W 7 N u 6 t X G l d 5 H U U 4 g V M 4 B w 8 u o A E 3 0 I Q W E F D w D K / w 5 j w 5 L 8 6 7 8 7 E Y L T j 5 z j H 8 g f P 5 A 5 O h k d 4 = < / l a t e x i t > H N y q 0 X 6 9 3 6 W I 1 W r H L n H P 7 A + v w B j F W S 6 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H / f w 8 P c 1 x i j q w G d S w G G 5 1 b K G 2 z A = " > A A A B + X i c b V D L S s N A F L 2 p r 1 p f U Z d u B l v B V U m 6 U X c F N y 4 r 2 A e 0 I U w m k 3 b o Z B J m J o U S + i d u X C j i 1 j 9 x 5 9 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c q Z 0 o 7 z b V W 2 t n d 2 9 6 r 7 t Y P D o H N y q 0 X 6 9 3 6 W I 1 W r H L n H P 7 A + v w B j F W S 6 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H / f w 8 P c 1 x i j q w G d S w G G 5 1 b K G 2 z A = " > A A A B + X i c b V D L S s N A F L 2 p r 1 p f U Z d u B l v B V U m 6 U X c F N y 4 r 2 A e 0 I U w m k 3 b o Z B J m J o U S + i d u X C j i 1 j 9 x 5 9 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c q Z 0 o 7 z b V W 2 t n d 2 9 6 r 7 t Y P D o H N y q 0 X 6 9 3 6 W I 1 W r H L n H P 7 A + v w B j F W S 6 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H / f w 8 P c 1 x i j q w G d S w G G 5 1 b K G 2 z A = " > A A A B + X i c b V D L S s N A F L 2 p r 1 p f U Z d u B l v B V U m 6 U X c F N y 4 r 2 A e 0 I U w m k 3 b o Z B J m J o U S + i d u X C j i 1 j 9 x 5 9 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c q Z 0 o 7 z b V W 2 t n d 2 9 6 r 7 t Y P D o H N y q 0 X 6 9 3 6 W I 1 W r H L n H P 7 A + v w B j F W S 6 Q = = < / l a t e x i t > S d < l a t e x i t s h a 1 _ b a s e 6 4 = " y I U O E W 5 I M K B p 4 1 e B c u J t W k N a m D o = " > A A A B + X i c b V A 9 T 8 M w F H w p X 6 V 8 B R h Z L F o k p i r p A m y V W B i L o K V S G 0 W O 4 7 R W H S e y n U p V 1 H / C w g B C r P w T N v 4 N T p s B W k 6 y d L p 7 T + 9 8 Q c q Z 0 o 7 z b V U 2 N r e 2 d 6 q 7 t b 3 9 H S e y n U p V 1 H / C w g B C r P w T N v 4 N T p s B W k 6 y d L p 7 T + 9 8 Q c q Z 0 o 7 z b V U 2 N r e 2 d 6 q 7 t b 3 9 H S e y n U p V 1 H / C w g B C r P w T N v 4 N T p s B W k 6 y d L p 7 T + 9 8 Q c q Z 0 o 7 z b V U 2 N r e 2 d 6 q 7 t b 3 9 H S e y n U p V 1 H / C w g B C r P w T N v 4 N T p s B W k 6 y d L p 7 T + 9 8 Q c q Z 0 o 7 z b V U 2 N r e 2 d 6 q 7 t b 3 9 7 z C m 5 d 6 L 9 6 7 9 7 F s L X j 5 z C n 8 g f f 5 A y v n j s I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t g v Y A W 2 u C / z / 8 A l V n O s q P 9 + 6 X a g = " 7 z C m 5 d 6 L 9 6 7 9 7 F s L X j 5 z C n 8 g f f 5 A y v n j s I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t g v Y A W 2 u C / z / 8 A l V n O s q P 9 + 6 X a g = " > A A A B 7 n i c b V A 9 T w J B E J 3 D L 8 Q v 1 N J m I 5 h Y k T s a t S O x s c R E w A Q u Z G + Z g w 1 7 u 5 f d P R N C + B E 2 F h p j 6 + + x 8 9 + 4 w B U K v m S S l / d m M j M v S g U 3 1 v e / v c L G 5 t b 2 T n G 3 t L d / c H h U P j 5 p G 5 V p h i 2 m h N K P E T U o u M S W 5 V b g Y 6 q R J p H A T j S + n f u d J 9 S G K / l g J y m G C R 1 K H n N G r Z M 6 V d L T I 1 X t l y t + z V + A r J M g J x X I 0 e y X v 3 o D 7 z C m 5 d 6 L 9 6 7 9 7 F s L X j 5 z C n 8 g f f 5 A y v n j s I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t g v Y A W 2 u C / z / 8 A l V n O s q P 9 + 6 X a g = " > A A A B 7 n i c b V A 9 T w J B E J 3 D L 8 Q v 1 N J m I 5 h Y k T s a t S O x s c R E w A Q u Z G + Z g w 1 7 u 5 f d P R N C + B E 2 F h p j 6 + + x 8 9 + 4 w B U K v m S S l / d m M j M v S g U 3 1 v e / v c L G 5 t b 2 T n G 3 t L d / c H h U P j 5 p G 5 V p h i 2 m h N K P E T U o u M S W 5 V b g Y 6 q R J p H A T j S + n f u d J 9 S G K / l g J y m G C R 1 K H n N G r Z M 6 V d L T I 1 X t l y t + z V + A r J M g J x X I 0 e y X v 3 o D 7 z C m 5 d 6 L 9 6 7 9 7 F s L X j 5 z C n 8 g f f 5 A y v n j s I = < / l a t e x i t > R k⇥m < l a t e x i t s h a 1 _ b a s e 6 4 = " r J r z p 4 v v P 8 u B f q j B M p g X Q G p T 8 k o = " > A A A C A X i c b V C 7 T s M w F H X K q 5 R X g A W J x a J F Y q q S L s B W i Y W x I P q Q m l A 5 r t N a t Z 3 I d p C q K C z 8 C g s D C L H y F 2 z 8 D t H L J K 9 A C n C q C B t T T U j v V g S x A N G u s H k K v e 7 D 0 Q q G o k 7 P Y 2 J z 9 F I 0 J B i p I 0 0 s I 9 q H k d 6 H A T p b X a f T j x N O V G Q Z 7 W B X X X q z g x w m b g F q Y I C r Y H 9 5 Q 0 j n H A i N G Z I q b 7 r x N p P k d Q U M 5 J V v E S R G O E J G p G + o Q K Z R X 4 6 + y C D p 0 Y Z w j C S p o S G M / X 3 R I q 4 U l M e m M 7 8 X L X o 5 e J / X j / R 4 Y W f U h E n m g g 8 X x Q m D O o I 5 n H A I Z U E a z Y 1 B G F J z a 0 Q j 5 F E W J v Q K i Y E d / H l Z d J p 1 F 2 n 7 t 4 0 q s 3 L I o 4 y O A Y n 4 A y 4 4 B w 0 w T V o g T b A 4 B E 8 g 1 f w Z j 1 Z L 9 a 7 9 T F v L V n F z C H 4 A + v z B 0 P Z l r g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r J r z p 4 v v P 8 u B f q j B M p g X Q G p T 8 k o = " > A A A C A X i c b V C 7 T s M w F H X K q 5 R X g A W J x a J F Y q q S L s B W i Y W x I P q Q m l A 5 r t N a t Z 3 I d p C q K C z 8 C g s D C L H y F 2 z 8 D t H L J K 9 A C n C q C B t T T U j v V g S x A N G u s H k K v e 7 D 0 Q q G o k 7 P Y 2 J z 9 F I 0 J B i p I 0 0 s I 9 q H k d 6 H A T p b X a f T j x N O V G Q Z 7 W B X X X q z g x w m b g F q Y I C r Y H 9 5 Q 0 j n H A i N G Z I q b 7 r x N p P k d Q U M 5 J V v E S R G O E J G p G + o Q K Z R X 4 6 + y C D p 0 Y Z w j C S p o S G M / X 3 R I q 4 U l M e m M 7 8 X L X o 5 e J / X j / R 4 Y W f U h E n m g g 8 X x Q m D O o I 5 n H A I Z U E a z Y 1 B G F J z a 0 Q j 5 F E W J v Q K i Y E d / H l Z d J p 1 F 2 n 7 t 4 0 q s 3 L I o 4 y O A Y n 4 A y 4 4 B w 0 w T V o g T b A 4 B E 8 g 1 f w Z j 1 Z L 9 a 7 9 T F v L V n F z C H 4 A + v z B 0 P Z l r g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r J r z p 4 v v P 8 u B f q j B M p g X Q G p T 8 k o = " > A A A C A X i c b V C 7 T s M w F H X K q 5 R X g A W J x a J F Y q q S L s B W i Y W x I P q Q m l A 5 r t N a t Z 3 I d p C q K C z 8 C g s D C L H y F 2 z 8 D t H L J K 9 A C n C q C B t T T U j v V g S x A N G u s H k K v e 7 D 0 Q q G o k 7 P Y 2 J z 9 F I 0 J B i p I 0 0 s I 9 q H k d 6 H A T p b X a f T j x N O V G Q Z 7 W B X X X q z g x w m b g F q Y I C r Y H 9 5 Q 0 j n H A i N G Z I q b 7 r x N p P k d Q U M 5 J V v E S R G O E J G p G + o Q K Z R X 4 6 + y C D p 0 Y Z w j C S p o S G M / X 3 R I q 4 U l M e m M 7 8 X L X o 5 e J / X j / R 4 Y W f U h E n m g g 8 X x Q m D O o I 5 n H A I Z U E a z Y 1 B G F J z a 0 Q j 5 F E W J v Q K i Y E d / H l Z d J p 1 F 2 n 7 t 4 0 q s 3 L I o 4 y O A Y n 4 A y 4 4 B w 0 w T V o g T b A 4 B E 8 g 1 f w Z j 1 Z L 9 a 7 9 T F v L V n F z C H 4 A + v z B 0 P Z l r g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r J r z p 4 v v P 8 u B f q j B M p g X Q G p T 8 k o = " > A A A C A X i c b V C 7 T s M w F H X K q 5 R X g A W J x a J F Y q q S L s B W i Y W x I P q Q m l A 5 r t N a t Z 3 I d p C q K C z 8 C g s D C L H y F 2 z 8 D t H L J K 9 A C n C q C B t T T U j v V g S x A N G u s H k K v e 7 D 0 Q q G o k 7 P Y 2 J z 9 F I 0 J B i p I 0 0 s I 9 q H k d 6 H A T p b X a f T j x N O V G Q Z 7 W B X X X q z g x w m b g F q Y I C r Y H 9 5 Q 0 j n H A i N G Z I q b 7 r x N p P k d Q U M 5 J V v E S R G O E J G p G + o Q K Z R X 4 6 + y C D p 0 Y Z w j C S p o S G M / X 3 R I q 4 U l M e m M 7 8 X L X o 5 e J / X j / R 4 Y W f U h E n m g g 8 X x Q m D O o I 5 n H A I Z U E a z Y 1 B G F J z a 0 Q j 5 F E W J v Q K i Y E d / H l Z d J p 1 F 2 n 7 t 4 0 q s 3 L I o 4 y O A Y n 4 A y 4 4 B w 0 w T V o g T b A 4 B E 8 g 1 f w Z j 1 Z L 9 a 7 9 T F v L V n F z C H 4 A + v z B 0 P Z l r g = < / l a t e x i t > r H e r Y / l a M k q d k 7 g D 6 z P H y 0 x k r Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H i 2 k U t X u K l n 8 6 p U s s r 7 l P 6 e Q 3 Q Y = " r H e r Y / l a M k q d k 7 g D 6 z P H y 0 x k r Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H i 2 k U t X u K l n 8 6 p U s s r 7 l P 6 e Q 3 Q Y = " r H e r Y / l a M k q d k 7 g D 6 z P H y 0 x k r Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H i 2 k U t X u K l n 8 6 p U s s r 7 l P 6 e Q 3 Q Y = " moderate number of qubits, research has focused on the estimation of states which belong to smaller dimensional models which nevertheless capture relevant physical properties, such as low rank states [48,17,37,49,36] and matrix product states [20,12,50].
In this paper we analyse and compare several estimation methods for fixed (non-adaptive) measurement designs, with a focus on risks (mean errors) for different loss functions, asymptotic theory, relationships between estimators and low rank behaviour. The measurement scenarios include repeated measurements with respect to Pauli bases (as customary in multiple qubits experiments), random bases measurements, and the covariant measurement. The loss functions are given by the Frobenius, trace-norm, operator-norm, Bures and Hellinger distances. Each section deals with one class of estimators and the results of a comparative simulations study are presented at the end of the paper. While most estimators have been previously considered in the literature, our aim is to investigate them from a common perspective, as projections of data onto the parameter space. Another aim is to understand and quantify the reduction in statistical error between an unconstrained estimator such as least squares, and an estimator which take into account the physical properties of the parameter space, such as the projected least squares estimator. Among the original results, we derive the asymptotic error rates of these estimators on a class of low rank states in the covariant measurement scenario. Finally, we discuss the computational efficiency of different methods.
Below we summarise the paper using Figure 1 for illustration. A measurement M on ddimensional system is a positive affine map from the convex set of states S d (constrained parameter space) onto the space of outcome probabilities P d ⊂ R z , where z is the number outcomes. The image of the set of trace-one selfadjoint matrices M 1 sa (C d ) (unconstrained parameter space) is a hyperplane L d of R z , which contains P d . For informationally complete measurements the map M is injective and we can identify all matrix estimators (whether positive or not) with their images in L d .
In section 3 we review the use of Fisher information in asymptotic normality theory and discuss to what extent it is applicable to the study of the maximum likelihood (ML) estimator in quantum tomography [10,14,66,75]. We distinguish between two ML estimators: the unconstrained estimatorρ uML , and the 'physical' estimatorρ ML . The former is the maximiser of the likelihood, seen as a function over M 1 sa (C d ), and may not be a density matrix. The latter performs the same optimisation but restricted to the physical space of density matrices S d ⊂ M 1 sa (C d ). For large sample size, the unconstrained estimator is asymptotically normal, while the constrained estimator is equal to its projection onto S d with respect to the metric defined by the classical Fisher information [66].
In section 4 we analyse the least squares (LS) estimatorρ LS . This exploits the linear dependence between probabilities and states which translates into a linear regression problem where f denotes the vector of observed frequencies of measurement outcomes, X is a fixed measurement design matrix, β is a vectorisation of the unknown state ρ, and is the statistical noise. The LS estimator minimises the prediction error Xβ − f 2 over all vectorsβ, and is the simplest and fastest estimator to compute. However, LS is significantly less accurate than ML, especially for low rank states and does not produce a physical state. Nevertheless, the LS has been the focus of recent investigations [17,72,36] as a first step towards more accurate estimators. Here we review some of the non-asymptotic concentration bounds for the operatornorm error of the LS estimator. We then study the asymptotic properties of the LS estimator in the context of covariant measurements, and rank r states ρ r with equal non-zero eigenvalues. By exploiting the symmetry of the measurement we compute the explicit expression of the risk with respect to the Frobenius distance. Furthermore, we show that for large d, the eigenvalues of the 'error matrix'ρ LS − ρ r are approximately distributed according to the Wigner semicircle law on the interval [−2 d/N , 2 d/N ], cf. Figure 2. This provides the asymptotic estimates of the operator-norm, and the trace-norm errors of 2 d/N and respectively 8d 3/2 /(3π √ N ), which complement the non-asymptotic bounds of [36].
In section 5 we discuss the generalised least squares (GLS) estimator. Unlike the LS estimator which minimises the prediction error, the GLS aims to optimise the estimation error, e.g. E β − β 2 and needs to take into account the covariance matrix of the multinomial noise . We show that for large samplesize N , the GLS estimator is asymptotically normal and is equivalent to the uML estimator. Section 6 reviews the thresholded least squares (TLS) estimator proposed in [17]. This is obtained by projecting the LS estimator onto the set of states whose non-zero eigenvalues are above a certain 'noise level' chosen in a data-driven fashion. The projection can be computed efficiently in terms of the eigenvalues of the LS estimator, and the truncation leads to significant error reduction for low rank states. In practice, an additional improvement is achieved by using the GLS estimator as starting point. Section 7 discusses the positive least squares (posLS) estimator which optimises the prediction error over the physical space of density matrices, rather than over the selfadjoint matrices as is the case for the LS estimator [46]. This leads to higher accuracy, but its computational complexity is similar to that of ML. However, in the case of the covariant measurements we find that posLS is equivalent to the projected least squares estimator discussed below, which can be computed efficiently. By restricting the GLS optimisation over density matrices we obtain the positive generalised least squares estimator (posGLS), which is shown to be asymptotically equivalent to the ML estimator. Section 8 deals with the projected least squares (PLS) estimator. This is obtained by projecting the LS estimator onto the space of density matrices with respect to the Frobenius distance [72,36]. It is faster than TLS, and has similar statistical properties. In [36] is was shown that the PLS estimator satisfies rank-dependent concentration bounds and its trace-norm error scales as O(r 2 · d · log d/ √ N ) for a broad class of 2-design measurements including mutually unbiased bases, stabiliser states, and symmetric informationally complete measurements, and as O(r 2 · d 1.6 · log d/ √ N ) for Pauli bases measurements. In this paper we focus on the asymptotic behaviour of the PLS estimator in the case of covariant measurements. Inspired by the techniques developed in [66] we show how the random matrix properties of the LS can be used to derive the asymptotic Frobenius and Bures risks of the PLS estimator for low rank states and large dimension d and samplesize N . In particular we uncover an interesting behaviour of the Bures risk which (unlike the Frobenius or the trace norm risks) increases steeply with rank for low rank states and then decreases towards full rank, cf. Figure 3.
Section 9 presents the results of a comparative simulations study of the proposed estimators. We simulated data for a range of states of 3 and 4 atoms of different ranks, with different samplesizes, and measurement setups. For each choice we produced 100 datasets in order to estimate risks of different estimators. The measurements are chosen to be either Pauli bases measurements (as standard in ion trap experiments) or random basis measurements. The different estimators and their risks corresponding to Frobenius, Bures, Hellinger, and trace-norm distances were computed and the results are illustrated and discussed, cf. Figures 4,5,6,7,8,9,10,and 11. A complete set of simulation results available online via an interactive Rshiny application at https://rudhacharya.shinyapps.io/plots/. We also created an online estimation app which computes the key estimators for a range of states of certain characteristics, or user uploaded states. This is available at https://shiny.maths.nottingham.ac.uk/shiny/qt_dashboard/.

Quantum tomography
In quantum tomography, the aim is to estimate an unknown state from outcomes of measurements performed on an ensemble of identical prepared systems. Although a large part of our theoretical considerations hold in a general setting, we choose to formulate the tomography problem in the specific context of a system consisting of multiple qubits and projective measurements with respect to several bases. This keeps the discussion closer to realistic experimental procedures, and facilitates the understanding of our simulation results.
We consider composite systems consisting of n qubits, with associated Hilbert space C d , where d = 2 n . The state ρ belongs to the space of d × d density matrices S d ⊂ M (C d ). In our analysis we will distinguish between the constrained parameter space S d whose elements are trace-one positive matrices, and the unconstrained parameter space M 1 sa (C d ) consisting of traceone selfadjoint matrices. This will allow us to consider procedure which produce constrained, or unconstrained estimators.
The measurement strategies consist of performing standard, von Neumann projective measure-ments with respect to a number of orthonormal bases (ONB), which are chosen deterministically or randomly. In particular, we focus on two scenarios: Pauli basis measurements and measurements that are drawn randomly from the uniform measure over all ONBs. While the former setup is commonly employed in experiments [38], the latter is less restrictive, more amenable to theoretical analysis, and can serve as a benchmark for current experiments. We also consider covariant measurements in the context of the asymptotic theory of the least squares and respectively projected least squares estimators in sections 4.3 and 8.1, and refer to these for further details.
In the Pauli bases setup, one measures an arbitrary Pauli observable σ x , σ y , or σ z on each of the n ions simultaneously. Therefore, each measurement is labelled by a sequence s = (s 1 , . . . , s n ) ∈ {x, y, z} n , and there are 3 n possible measurement bases. Such a measurement produces a ±1 , −1} n , s ∈ S } is a 2 n × k table whose columns are independent and contain all the counts in a given setting. Its probability is given by the product of multinomials Our goal is to statistically reconstruct the density matrix ρ from the counts dataset D. This can be seen as a statistical inverse problem of reversing the measurement map in the sense that given a sample D from p ρ (·|S), one would like to produce an estimatorρ(D) which is close to ρ with respect to some meaningful distance measure. The next section lists several figures of merit considered here.

Error functions
Let us denote the risk or mean error of an estimatorρ as E [D(ρ, ρ)], where D(ρ, ρ) represents a particular error function. In our theoretical analysis and simulations study we estimate the risk for several choices of the error function D(ρ, ρ), which are tabulated in

Fisher information, asymptotic normality and maximum likelihood
The maximum likelihood (ML) estimator is one of the most commonly used and well understood statistical tools with a universal range of applicability. Before considering its use in quantum tomography, we briefly review some of the key concepts and results related to the ML estimator and its asymptotic behaviour [51,78]. Consider the scenario in which we are given N independent samples X 1 , . . . , X N from a discrete probability distribution p θ over a countable set Ω; the probability distribution is assumed to depend smoothly on an unknown parameter θ which belongs to an open subset Θ of R p . The likelihood of the dataset is p θ (X 1 , . . . , X N ) = N j=1 p θ (X j ). The ML estimator is the point in Θ where the likelihood of the observed data attains its maximum valueθ ML := arg max θ ∈Θ p θ (X 1 , . . . , X N ).
The likelihood can be expressed in terms of the empirical distribution of the data f = j δ Xj /N which collects the frequencies of different outcomes in Ω. Indeed the log-likelihood can be written as (cf. (1)) where K(· ·) relative entropy (Kullback-Leibler divergence), and C, C do not depend on θ. The ML estimator is thus the closest point to f with respect to the relative entropŷ Asymptotics. The appeal of the ML estimator lies in part in its asymptotic optimality properties. For large enough sample sizes N , a central limit behaviour emerges and the ML estimator becomes normally distributed around the true parameter, with a variance shrinking at the rate 1/N The convergence above holds in distribution as N → ∞ and the limit is a centred Gaussian distribution with covariance given by the inverse of the Fisher information matrix. The latter is a p × p positive matrix which encapsulates how informative a single sample X from p θ is about θ. The limiting covariance I −1 (θ) is equal to the lower bound appearing in the the Cramér-Rao inequality for unbiased estimators. Estimators which satisfy equation (2) are called efficient and statistical theory shows that one cannot improve on their asymptotic accuracy, except on a measure zero set of the parameter space [51]. In particular if d(·, ·) is a locally quadratic (positive) loss function on Θ, then the risk of the MLE satisfies It is important to note that the asymptotic normality property (2) relies on the smoothness of the model and the fact that θ is an interior point of the parameter space Θ. For large N the ML estimator lies (with high probability) within in a small neighbourhood of θ of size 1/ √ N , and the parameter space looks like R p for all practical purposes. However, when Θ has a boundary, the ML estimator will not be asymptotically normal for parameters θ lying on the boundary, and one needs to analyse such models more carefully. This is the case in quantum tomography, when the unknown state is not full-rank and therefore lies on the boundary of S d , or is so close to the boundary that the asymptotic theory will kick in only for sample sizes that are much larger than those obtained in real experiments.

The maximum likelihood estimator in quantum tomography
We will now discuss in more detail the properties of the MLE in the specific case of quantum tomography. Given the measurement data encoded in the dataset D, the MLE is defined as the maximum of p τ (D|S ) over τ ∈ S d . By passing to log-likelihood and discarding the constant factorial terms in (1), we obtain the following form of the estimator where f (o|s) = N (o|s)/m are the empirical frequencies of the data. The MLE is commonly used in quantum tomography [10,14,44,32], and several implementation methods have been proposed including Hradil's iterative algorithm [64]. Our specific implementation uses the CVX package for disciplined convex programming on MATLAB [2]. The general asymptotic theory guarantees that the ML estimator is asymptotically normal for full rank states, i.e. the interior of S d . To get more insight into the general asymptotic behaviour for a given state ρ, we will choose a local parametrisation defined in terms of the matrix elements of ρ with respect to its eigenbasis. Let λ 1 ≥ λ 2 . . . , ≥ λ d > 0 be the eigenvalues of a full-rank state ρ. Then any neighbouring state can be written as where δρ θ is trace zero matrix which is completely parametrised by In particular, for any locally quadratic loss function d(·, ·) (e.g. Frobenius distance, or Bures distance) with weight matrix G(θ), the associated risk has the asymptotic behaviour Now, let us assume that the unknown state ρ is on the boundary of S d , and more precisely belongs to the space of rank r states S d (r) ⊂ M (C d ), for a fixed and known rank r ≤ d. In its own eigenbasis, ρ is the diagonal matrix of eigenvalues Diag(λ 1 , . . . , λ r , 0, . . . , 0), and any sufficiently close state is uniquely determined by its matrix elements in the first r rows (or columns). Intuitively this can be understood by noting that any rank-r state ρ in the neighbourhood of ρ can be obtained by perturbing the eigenvalues and performing a small rotation of the eigenbasis; in the first order of approximations these transformation leave the (d − r) × (d − r) lower-right corner unchanged so where C = O( A 2 , B 2 ). We therefore choose the (local) parametrisation ρ = ρ θ where only the matrix elements of A and B enter the parameter θ. For this model, any rank-r state corresponds to an interior point of the parameter space S d (r), and consequently the ML estimator obeys the asymptotic normality theory described above.
However, if the rank of ρ is not known in advance, then the diagonal block C of ρ needs to be included in the model in order to describe neighbouring states of higher rank. As ρ is a state, the block C is a positive matrix, and therefore its matrix elements are constrained. This complicates the analysis of the likelihood function, and invalidates the asymptotic normality of the ML estimator.
What should be the theory replacing asymptotic normality ? At the moment there isn't a complete answer to this question, but some important progress has been made in [66]. Following this work and [17], it is instructive to study an extended, 'non-physical' model in which the positivity requirement is dropped and (locally around ρ) the parameter space is taken to be that of selfadjoint matrices of trace-one M 1 sa (C d ). Therefore, the 'unphysical' parameter θ consists of the matrix elements of the blocks A, C, B (except one diagonal element due to normalisation), with ρ = ρ θ given by equation (7). We now make the 'mild' assumption that p ρ (o|s) > 0 for all pairs (o, s); indeed this condition is satisfied for 'generic' states but fails for states whose support is orthogonal to some of the measurement basis vectors. Under this assumption, we find that locally around ρ, we can define a statistical model given by bona-fide probability distributions p θ := p ρ θ . We can therefore define the unconstrained maximum likelihood (uML) estimator as in equation (3) where the optimisation is performed over the unconstrained local neighbourhood O ⊂ M 1 sa (C d ) of ρ, rather than over the space of states S d . Since ρ is on the boundary of S d , the probability thatρ uML falls outside the state space is significant and does not vanish in the asymptotic limit. In this case,ρ uML does not coincide with the 'constrained' or regular ML estimatorρ ML . In fact, each of them can be seen as the projection with respect to the relative entropy distance of the empirical distribution f onto the sets of probabilities M(O) and respectively M(S d ), where M : M (C d ) → R k·d is the measurement map.
Asymptotics. Since ρ corresponds to an interior point of the unconstrained parameter space, the general asymptotic normality theory applies again. In particular, the uML estimator is normally distributed around θ = 0 with variance (N I(ρ|S )) −1 , where I(ρ|S ) is the Fisher information (5) computed with respect to the unconstrained parametrisation. Moreover by Taylor expanding the log-likelihood function around the θ uML , and using the fact the the first derivative vanishes at this point, we obtain the second order approximation This implies tht for large N , the ML estimatorρ ML is the projection ofρ uML onto S d with respect to the quadratic distance determined by the Fisher information Moreover, the probability distribution of the MLE is obtained by projecting the Gaussian distribution corresponding toρ uML , onto S d . Although the projection can be computed efficiently using convex optimisation, finding a general characterisation of the resulting distribution seems to be a challenging problem [71]. Nevertheless, [66] shows that the problem is tractable in those cases where the metric d I is (approximately) isotropic, so that random matrix results such as the emergence of Wigner semicircle law can be used to treat the asymptotic theory. In section 8.1 we will use these ideas to study the asymptotic behaviour of the projected least squares estimator.

The least squares estimator
The least squares (LS) estimator [63,17] is based on the observation that the outcome probabilities p ρ (o|s) depend linearly on the state ρ. Let be the decomposition with respect to an orthonormal basis of M (C d ) consisting of selfadjoint matrices {τ i : 1 ≤ i ≤ d 2 }, and β := (β 1 , . . . , β d 2 ) T the corresponding vectorisation. Then the probabilities can be expressed as where X is a kd × d 2 fixed matrix depending on the measurement and the chosen vectorisation.
In an experimental setup we do not have access to the true probability vector. Instead from the d × k dataset of counts, we can compute the empirical probabilities f (o|s) := N (o|s)/m, whose expectations are Ef (o|s) = p ρ (o|s). Replacing probabilities vector with empirical frequencies where ∈ R dk is a mean zero vector of statistical noise. The noise distribution is irrelevant for the definition of the LS estimator, but we will return to this when discussing its error, and the generalised least squares estimator.
The least squares solution to the above system of equations is defined as the minimiser of the following optimisation problemβ and has the explicit formβ LS = (X T X) −1 · X T · f . The final estimateρ LS of the density matrix is then constructed from the estimated parameter vectorβ LS using equation (10). We note that the least squares estimator produces a state estimateρ LS = ρ β LS that is not necessarily a density matrix.
Least squares as a projection.

Least squares as inversion of a measure-and-prepare channel
The LS estimator was introduced above by choosing a particular vectorisation of the density matrix. While this is useful for numerical implementations, the disadvantage is that one loses the physical interpretation of the quantum state and the measurement map. Conceptually, it is useful to define the LS estimator in a "coordinate free" way which can be interpreted as the inversion of a certain measure-and-prepare map associated to the measurement [36]. Let s. If ρ is represented in its vectorised form, then the map D is given by the matrix X T X/|S |.
On the other hand, the preparation map P has matrix X T , so that the LS estimator can be expressed asρ From this expression we see immediately that Tr(ρ LS ) = 1 as consequence of the fact that f /|S | is a probability distribution and D is trace preserving. Additionally, we note the the accuracy of the LS estimator is linked to the properties of the channel D. In particular, in the case of Pauli measurements the channel is given by a tensor product D = C ⊗n of qubit depolarising channels [17,36] C : ρ → 1 3 ρ + 2 3 On the other hand, the measurements corresponding to the class of 2-designs (which includes covariant measurements, mutually unbiased bases, stabiliser states, symmetric informationally complete measurements) have associated channel given by [36] D :

Concentration bounds and asymptotic behaviour of the LS
For more insight into the structure of the LS estimator let us unpack equation (13) and note that ρ LS can be written as an average of independent, identically distributed matrices where o i|s are the outcomes of the measurements with respect to setting s.
In this form, the LS estimator is amenable to non-asymptotic matrix concentration techniques, as well as asymptotic normality theory. The following result [36] was obtained by applying matrix Bernstein inequalities [76] and provides a non-asymptotic confidence bound for LS with respect to the operator norm distance, see also [17]. Then where g(d) = 2d for 2−design measurements and g(d) d 1.6 for Pauli measurements.
The theorem provides upper bounds for risks with respect to commonly used loss functions such as the Frobenius distance (norm-two squared) and the trace-norm distance, see also [72] for related results. Indeed using the basic inequalities A 2 2 ≤ d A 2 and A 1 ≤ d A we obtain the upper bounds for the 2-design measurements and for the Pauli basis measurement, where c 1 , c 2 , C 1 , C 2 are a numerical constants. Moreover, in [36] it was shown that the log factor in (15) can be removed when we deal with covariant measurements. On the other hand, in section 3 we have shown that for the maximally mixed state, the optimal convergence rate for the Frobenius distance is of the order O(d 2 /N ); this means that the upper bound (15) cannot be improved, when seen as a uniform bound over all states. However, this leaves open the possibility that special classes of states can be estimated with higher accuracy that that provided by the LS estimator. Indeed, we will see that simple modifications of the LS estimator which take into account the positivity of the unknown state can significantly reduce the estimation errors for low rank states.
Asymptotics. Thanks to its simplicity, the LS estimator has a tractable asymptotic theory. As in the case of the concentration Theorem 4.1 the key point is that the error ρ LS is a sum of independent, identically distributed matrices given by equation (14). In the limit of large m each matrix element of ρ LS has a Gaussian distribution; in terms of the vectorised form we have where Ω := mCov [ |S ] is the (renormalised) covariance of the noise . Due to the independent structure of the measurement settings, the latter has a non-trivial block diagonal form with d × d blocks corresponding to the different settings s This allows us in principle to compute the asymptotic risk of an arbitrary quadratic loss function with weight matrix G (such as the Frobenius distance) as Since f is a vector of frequencies, it must satisfy k normalisation constraints, which is reflected in the fact each block Ω s is singular, with zero eigenvector 1 = (1, . . . , 1) T . Consequently, the covariance matrix V LS is singular with the zero eigenvector corresponding to the trace which is a fixed parameter. We will come back to this when discussing the generalised least squares estimator.

Asymptotic theory of LS for covariant measurements
While asymptotic normality tells us that the estimatorβ LS lies in an ellipsoid centred at β, it treats the estimator as a vector rather that as a matrix. For instance, it does not immediately provide an asymptotic theory for the operator norm error ρ LS − ρ . Random matrix theory provides us with other asymptotic results which take into account the matrix structure of the estimator. To explore this avenue we will make a simplifying assumption and place ourselves in the scenario of covariant measurements. The outcome of such a measurement is a one dimensional projection P = |ψ ψ| and the corresponding positive operator valued measure is M (dP ) := d · P · dP (18) where dP is the uniform measure over the space of one dimensional projections; the latter is the measure induced by the Haar measure over the unitaries U via the action P → U P U * . An alternative way of obtaining a measurement sample is to choose a random basis and perform a measurement with respect to the chosen basis.
In this setup, the channel D is given by [36] D : ρ → Tr(ρM (dP ))P = dP Tr(ρP )dP = 1 d + 1 (ρ + 1) and the LS estimator given by equation (14) can be written aŝ where P i are the independent outcomes of measurements.
We will be particularly interested in the behaviour of the LS estimator for low rank states, as well as states close to the maximally mixed one. Due to covariance it suffices to choose states which are diagonal in the standard basis {|i : i = 1, . . . , d}, and for simplicity we will restrict ourselves to rank-r states with equal eigenvalues ρ r = r i=1 |i i|/r. As in section 3.1 we write the LS estimator asρ where A, B, C are block matrices of mean zero, and I r is the r × r identity block. We will deal with each block separately.
Block C. By covariance, the distribution of the block C is invariant under unitary transformations in C d−r . As the LS estimator is unbiased, the matrix elements of C are centred. The off-diagonal elements have real and imaginary parts with variances equal to where we have written P = U |k k|U * for r < k = i, j and replaced the integration over projections with that over unitaries; the integral was then evaluated using Weingarten formulas [19]. Similarly, the variance and covariance of the diagonal elements are and all off-diagonal elements (of all blocks A, B, C) have zero covariance with other matrix elements. By the Central Limit Theorem, in the limit of large N , the off-diagonal elements of ρ LS become normally distributed and independent of all other elements. On the other hand, as the covariance of diagonal elements is non-zero, they converge to correlated Gaussian variables. However, if we also take the limit of large d we find that the covariance of the diagonal elements vanishes and the matrix C is distributed approximately as the Gaussian unitary ensemble (GUE). Universality results for random matrices [65] imply that the empirical distribution of the eigenvalues of C/ √ d − r converges weakly to Wigner's semicircle law on the interval [−2, 2], whose probability density is Indeed, panel a) of Figure 2 shows a good match between the histogram of the eigenvalues of the error block C/ √ N for a rank-one state of n = 7 atoms and N = 10 6 samples, and the corresponding Wigner distribution. Block A. Let us consider now block A of equation (19). For the same symmetry reasons, its off-diagonal elements (when r > 1) have real and imaginary parts with variance where all indices i, j, k, l are different. The diagonal elements have variance while the covariance of the diagonal elements is We note that for small r the diagonal elements do not become independent when N and d are large. However, if we also take r to be large, the covariance vanishes, as in the case of the C block. By the same argument, the distribution of the eigenvalues of A/ √ r converges to the Wigner semicircle law (20). This is illustrated in the right panel of Figure 2, for the case of the totally mixed state of n = 7 atoms.

Block B. The elements of block B have variances equal to
Eigenvalues distribution of the LS estimator. To better understand the LS estimator, it is instructive to study the distribution of its eigenvalues. This will also be used in the study of the projected least squares estimator furher on. Assuming N is large, we note thatρ LS can be written asρ where This means that in the leading order in N −1/2 the set of eigenvalues ofρ LS is the union of those of I r /r + A/ √ N and C/ √ N . In the case of large N, d and small rank r, the first group of eigenvalues are small fluctuations of size r/ √ N around 1/r; in particular these eigenvalues are positive with high probability; the second group of eigenvalues are distributed according to the Wigner law, and have maximum absolute value approximately equal to 2 (d − r)/N , and roughly half of them will be negative.
We can now evaluate the asymptotic risk of the LS estimator with respect to the chosen loss functions.
Frobenius risk. For the Frobenius distance, the asymptotic mean square error is In particular, the leading contribution to the Frobenius risk is d 2 /N , and the dependence with r is weak. As illustrated in panels a) and b) of Figure 3 the theoretical prediction match the simulation results for 5 and 6 atom states.
Operator norm risk. The operator-norm error of the LS estimator is Above we found that the entries of A, B, C become independent in the limit of large N , except for correlations between the diagonal elements which vanish if we additionally take the limit of large d. Moreover, the variances of the elements in the B block and the off-diagonal elements of A differ from those of the off-diagonal elements of C by factors (r + 2)/r and respectively (r + 1)/r. Therefore, the error block matrix as a whole is not distributed according to GUE ensemble. However, the universality of the Wigner semicircle law, the limit holds not only for highly symmetric ensembles like GUE but also for 'small' perturbations of this ensemble, e.g. random matrices with independent entries, whose variances do not deviate too much from a fixed value [23]. In particular for low rank r d, the total size of the blocks A, B, B * is of the order rd which is much smaller than than the size (d − r) 2 of C. Therefore, the asymptotic behaviour of the error matrix is determined by that of C and its spectrum converges to the Wigner law in the limit of large sample size and large dimension. In particular, the leading contribution to the norm error ρ LS − ρ r for low rank states is 2 d/N . A similar situation occurs for high rank states r ≈ d where the dominant variances are provided by the block A.
Indeed simulations for a pure and the fully mixed state of n = 7 atoms with N = 10 6 samples gave norm errors 0.0225 and respectively 0.0221, while the above theoretical prediction is 0.0226. Further simulation results for n = 5 and n = 6 atoms states of different ranks show a good match with the above estimate.
Trace norm risk. As in the case of the operator norm, the trace norm error can be estimated thanks to the fact that the eigenvalues of error matrix follow an approximate Wigner semicircle distribution. Using 1 2π we find that for low rank, or close to full rank states, the leading contribution to E( ρ LS − ρ 1 ) is 8d 3/2 /(3π √ N ). This rate agrees with the upper bound in equation (15), but provides an exact asymptotic constant for this rate. Indeed simulations with a pure and maximally mixed states of n = 7 atoms and N = 10 6 samples gave norm-one errors of 1.228 and respectively 1.229 while the theoretical prediction is 1.229.

Generalised least squares estimator
Consider the generic linear regression problem with x ∈ R a and unknown vector, y ∈ R b a vector of observations and n the 'noise' term with fixed and known covariance matrix C, which is assumed to be strictly positive. While the LS estimatorx LS is optimal in the sense of minimising the prediction error Ax − y , this is not the case when considering an estimation error e.g. the mean square error E x − x 2 , unless the covariance matrix is proportional to the identity. To recover the 'equal noise' situation we multiply equation (22) from the left by C −1/2 , to obtain The LS estimator of the last regression equation has the smallest covariance matrix among linear unbiased estimatorsx This is called the generalised least squares (GLS) estimator. when the noise distribution is Gaussian, the GLS estimator coincides with the MLE. In the i.i.d. setting where m samples are available, the GLS estimator is asymptotically normal and efficient GLS as a projection. Similarly to the LS estimator we consider the imageŷ GLS = Ax GLS of the GLS estimator x GLS . Thenŷ GLS is the projection of the data y onto the subspace Ran(A) ⊂ R b with respect to the covariance-dependent metric GLS for tomography. Let us return now to the tomography regression problem of the form given by equation (11). While the LS estimator is optimal in the sense of equation (12), in general it is not optimal in the estimation sense for any locally quadratic distance function. To remedy this we would like to construct a corresponding generalised least square estimator to take into account the nontrivial form of the noise covariance matrix Ω given by equation (17). However, there are two issues which prevent us from directly applying the GLS methodology to the tomography data. The first is that Ω is unknown since it depends on the true probabilities, and therefore on the unknown state ρ. We therefore propose to use an estimate of the covariance matrix instead. We describe the computation of this estimate in Appendix A.2. The second difficulty is that Ω is a singular matrix due the constraint o f (o|s) = 1 for each setting s. This means that for each setting s the vector |1 s ∈ C d is a zero eigenvector for Ω s and one should work within the orthogonal complement of |1 s . This can be achieved by choosing a block-diagonal isometry V : C k(d−1) → C k·d such that each block V s satisfies 1 s |V s = 0, and defining the new 'frequencies' vectorf with settings componentsf s = V * s f s . Similarly, we can also remove the trace-one constraint for states by choosing the vectorisation (10) such that the last basis element is In this case the state is uniquely determined by the free parameters {β i : 1 ≤ i ≤ d 2 − 1}. If we denote by J : C d 2 −1 → C d 2 the isometric embedding with respect to the standard basis, thenβ := J * β is the truncated vector of free parameters for ρ.
The regression problem can now be written in terms of the 'tilde' vectors and matrices f =Xβ +˜ whereX = V * XJ and˜ has covariance matrixΩ = V * ΩV which is non-singular. We can now apply the GLS methodology and define the estimator aŝ whereΩ is the (non-singular) estimated covariance matrix. We denoteρ GLS as estimate of the density matrix constructed from its vectorised formβ GLS . For future use, let us denotẽ

Asymptotic theory of GLS
What are the asymptotic properties ofρ GLS ? For large m, the noise distribution becomes Gaussian, and the covariance estimator converges to the actual covariance. The estimatorβ GLS becomes asymptotically normal as m → ∞ which means that the corresponding asymptotic average Fisher information per sample is I GLS = X * Ω−1X /k. In appendix A.1 we show that I GLS coincides with the Fisher information I(ρ|S ) defined as in equation (5) for the parametrisationβ. This equality implies that the error rates of the GLS estimator have the same asymptotic behaviour as those of the uML estimator, and both estimators satisfy asymptotic normality. In fact one can make the stronger statement that the two estimators are asymptotically close to each other.
Equivalence of GLS and uML. As stated above, the GLS can be interpreted as the projection of the dataf onto the image ofX in R k(d−1) with respect to the metric On the other hand, in section 3.1 we showed that for large N , we can define the uML estimator as the projection of f onto the hyperplane L d := M(M 1 sa (C d )), with respect to the relative entropy distance. Since in the first order of approximation the relative entropy is given by the quadratic formΩ −1 , the two projections become identical in the asymptotic limit.

Thresholded least squares estimator
The LS estimator (as well as GLS) suffers from the disadvantage that it does not necessarily produce a density matrix, i.e, a positive semi-definite estimate of trace one. While the significant eigenvalues can be estimated reasonably well with enough data, the LS estimator will typically have negative eigenvalues corresponding to small or zero eigenvalues of the true state. The thresholded least squares estimator (TLS) proposed in [17], improves the LS estimator by selecting the state which is the closest in Frobenius norm toρ LS and whose non-zero eigenvalues are above a certain threshold ν ≥ 0, i.e.
the set of density matrices with spectrum in {0} ∪ {[ν, 1]}. The choice of the statistical noise threshold is informed by the accuracy of the LS estimate, and a theoretical and a 'datadriven' choices for this threshold are detailed in [17], see also [69,22]. In practice it is found that estimator's performance improves if the threshold is allowed to be 'data-driven', by using cross-validation to choose the optimal value of the threshold, see Appendix A.2.
It turns out that computingρ TLS is very efficient and can be easily implemented. Let be the spectral decomposition ofρ LS where we assume that the eigenvalues are sorted in descending orderλ 1 ≥ . . . ≥λ d . The thresholded estimator has the same eigenvectors asρ TLS , and its eigenvalues can be computed in terms ofλ i as summarised in Algorithm 1. In words, the algorithm checks if the smallest eigenvalue ofρ LS is above the noise threshold, and if it is thenρ TLS =ρ LS ( note that by construction Tr(ρ LS ) = 1, so there is no need to renormalise the LS estimator). On the other hand if the smallest eigenvalue is below the threshold, it is set to zero and the remaining eigenvalues are suitably shifted accordingly so that their sum is equal to one. The final estimateρ TLS is constructed by replacing the eigenvalues ofρ LS with these thresholded eigenvaluesλ i (ν). The theoretical properties of the TLS estimator are presented in [17] and are similar to those of the projected least squares estimator which is discussed in more detail below.

Thresholded Generalised Least Squares Estimator (TGLS)
This estimator is obtained by using the GLS estimateρ GLS instead of the LS estimate as a starting point for the thresholding procedure. The constant for thresholding is chosen in the same way by using cross-validation. The advantage of TGLS is that it starts from a more accurate estimator than LS, which leads to smaller errors compared to TLS. The price to pay is that it is somewhat more involved computationally, although still faster than ML.
By comparing with (12) we see thatρ posLS is the projection ofρ LS with respect to the distance on matrices induced by the euclidian distance on measurement probabilities d M (ρ, τ ) = M(ρ)− M(τ ) . To our knowledge, with the exception of the results in [46], its theoretical properties have not been studied in detail. While its statistical performance greatly improves on that of the LS, the posLS estimator has the drawback that it cannot be expressed in a closed form, and its computational complexity is comparable to that of the ML estimator.

Positive least squares estimator for covariant measurements
Let us consider the special case of the covariant measurement defined in equation (18). This measurement maps states into probability distributions in an (almost) isometric way [36] Indeed using Weingarten formulas [19] we get for any traceless X As a corollary, we find that the posLS estimator is the projection of the LS estimator with respect to the Frobenius distance. Therefore, for this specific measurement, the posLS estimator coincides with the projected least squares estimator which will be discussed in section 8.

Positive generalised least squares estimator
This estimator is defined in much the same way as the posLS estimator, by restricting the minimisation in (5) to parameters that correspond to density matrices. In keeping with the discussion in section 5, we consider the truncated frequency vectorf . By analogy with the GLS estimator, the positive generalised least squares (posGLS) estimator is defined aŝ We will show thatρ posGLS is asymptotically equivalent to the ML estimatorρ ML .
Asymptotic equivalence of posGLS and ML. As in the case of GLS and uML, it is easier to work with the images in C k(d−1) of the different estimators through the (injective) mapM; we denote byp posGLS =M(ρ posGLS ) the probability vector corresponding to the posGLS estimator, and similarly for other estimators. Let us equip the the space of 'frequencies' Therefore,p posGLS is the projection with respect to d Ω off onto the convex set On the other hand, the GLS estimator is the projection off onto the hyperplaneM(M 1 sa (C d )) which contains P d . Therefore, by properties of projections on convex subsets we find thatp posGLS is the projection ofp GLS onto P d .
In section 5.1 we showed that the GLS estimator is asymptotically equivalent to the uML, as a consequence of the fact asymptotically with m both projections are determined by the Fisher information metric. Since posGLS and ML are obtained by applying the same projections onto the smaller space P d , they are also asymptotically equivalent. Unfortunately, this equivalence does not provide us with general estimation method which is more efficient than ML; the projection involved in posGLS does not seem to have closed form expression and requires a similar optimisation process as ML.

Projected least squares
Recently, it has been shown that the theoretical propertiesρ TLS are preserved even if the threshold ν is chosen to be zero [36]. The projected least squares (PLS) estimator is defined aŝ where the optimisation is performed over all states τ . As in the case of the TLS estimator, this optimisation can be performed efficiently, and it only involves computing the spectral decomposition ofρ LS and applying Algorithm 1 with ν = 0. The PLS estimator is therefore faster than the data-driven TLS and turns out to have quite similar behaviour to the latter. Note that in general PLS is different from posLS , as both can be seen as projections of the LS estimator with respect to different metrics. However, as noted before, the estimators coincide in the case of covariant measurements. In the next section we will study this scenario in more detail.
Using the LS concentration bound of Theorem 4.1, the following rank dependent norm-one bound for the PLS estimator was derived in [36], where the measurement is either a 2-design or the Pauli bases measurement.
where g(d) = 2d for 2−design measurements and g(d) d 1.6 for Pauli measurements.
This gives a upper bound of O(r 2 · d log d/ √ N ) on the convergence rate of norm-one for 2-design measurements and O(log r 2 · d 1.6 log d/ √ N ) for Pauli measurements. In the first case the log d can be removed for covariant measurements and the resulting rate is optimal in the sense that it achives general lower bounds from [37].

The asymptotic behaviour of PLS for covariant measurements
In this section we look in more detail at the PLS estimator in the context of covariant measurements defined in equation (18). We have already seen that Theorem 8.1 provides a concentration bound on the norm-one risk of the PLS estimator. While such results are very valuable thanks to their non-asymptotic nature, it is instructive and useful to also understand the asymptotic behaviour for large sample size N and dimension d. Indeed, in section 4.3 we showed how central limit and random matrix theory can be used to obtain tighter bounds on estimation risks of the LS estimator.
We will be particularly interested in the behaviour of PLS for low rank states. Due to covariance it suffices to choose states which are diagonal in the standard basis {|i : i = 1, . . . , d}, and for simplicity we will restrict ourselves to rank-r states with equal eigenvalues ρ r = r i=1 |i i|/r. As in section 4.3 we write the PLS estimator aŝ for some error blocksÃ,B,C whose dependence on the A, B, C blocks of the LS estimator needs to be determined.
Our analysis draws on the asymptotic theory of the LS described in section 4.3, and the arguments developed in [66] for the analysis of the ML estimator. We know thatρ PLS has the same eigenbasis asρ LS , and its eigenvalues are obtained from those ofρ LS by the truncation procedure which involves repeatedly setting to zero negative eigenvalues and shifting the remaining ones so that they add up to one. The following 3 step procedure aims to identify the leading contributions to PLS for low rank states, and N d 1.
1. Diagonalisation. Recall that for large N the LS estimator can be block diagonalised by means of a 'small' unitary rotation U , cf. equation (21) Therefore the eigenvalues ofρ LS can be grouped in two sets. The first is the set of eigenvalues of the block C/ √ N , which for large d are distributed according to the Wigner semicircle law and lie between ±2 (d − r)/N . The second set consists of the eigenvalues of the block I r /r + A/ √ N , which are small fluctuations of order r/ √ N around 1/r.

Truncation.
As long as N d 1 and r is small, the second set of eigenvalues is well separated from the first and it is very unlikely that any of these eigenvalues will be set to zero in the truncation process. Therefore, the cut-off point for the eigenvalues of where the integral corresponds to the sum of the eigenvalues of the block C/ √ N above q, after being shifted by q. Writing q = 2 d − r/N with = (r, d, N, a) < 1, we get the following equation for As the left side is smaller than r, the integral needs to be smaller than rπ/2(d − r), which means that is close to 1 for r d. This agrees with the intuition that a large part of the eigenvalues of the lower block will be set to zero by projecting the LS onto states. Further details on finding an (approximate) solution to (28) can be found in [66]. In particular, we will approximate by the deterministic solution of equation (28) in which a = Tr(A) is set to zero; indeed, for large d this will have a negligible effect on but will allow us to compute it in terms of r and d deterministically.
3. Rotation to original basis. Once the cut-off point q has been computed, the projection of the rotated LS estimator U * ρ LS U onto states can be written as and [X] + denotes the positive part of X. The PLS estimator is now obtained by performing the inverse rotationρ We can now estimate the asymptotic risk of the PLS estimator with respect to the chosen loss functions.
Frobenius risk. The mean square error scales as N −1 and its rescaled version is The contribution fromB is where the variance of B has been computed as in section 4.3; note that the term 2r (d − r)/N vanishes for N d. This error is of the order 2rd and can be seen as stemming from the uncertainty in estimating the eigenbasis of ρ r . ForÃ we write Finally, the term E C 2 2 is given by the sum of the squares of the remaining eigenvalues which can be approximated using the limit Wigner law as Although the last term appears to be of order (d−r) 2 , a careful analysis of the integral [66] shows that it is of lower order than rd due to the fact that 1 − leading term in a Taylor expansion is proportional to (r/(d − r)) 2/5 . Therefore, by adding the three terms, we find that the Frobenius risk scales as 6rd/N . This agrees with the non-asymptotic results of [36], and provides the exact asymptotic constant of the Frobenius rate. By comparing with the lower bound to the asymptotic minimax rate of 2r(d − r)/N derived in [17] we find that PLS is optimal for such states, within a constant which is at most 3.
For the n = 7 atoms state with r = 10 the Frobenius error is 0.041, compared to the theoretical one 0.043. For rank r = 1 state of 8 atoms with N = 10 5 samples, the Frobenius error was 0.017, while the theoretical prediction is 0.016. The simulations results for all ranks of n = 5 and n = 6 atoms are presented in panels a) and respectively b) of Figure 3.
Operator norm risk. Unlike the case of the LS estimator, the error matrix of PLS does not approach a Wigner distribution. For this reason, obtaining the asymptotic operator-norm risk seems difficult. However, the following lower bound follows from equation (29) Due to the truncation, the matrixC is positive and the largest eigenvalue is approximtely On the other hand,Ã is dominated by the term 2 √ d − r I r whose norm is 2 √ d − r . Since ≈ 1 for large d, the lower bound is 2 √ d − r . This complements the nonasymptotic upper bound of [36] which has rate O( √ d). Moreover, the lower bound seems to be a good approximation to the actual risk. A simulation with a rank r = 10 state of n = 7 atoms and N = 10 6 samples gave an operator-norm error of 0.02 while the lower bound is 0.01. For rank r = 1 state of 8 atoms with N = 10 5 samples, the operator-norm error was 0.12, while the lower bound is 0.08.
Trace-norm risk. As in the case of the norm-error, we could not derive the asymptotic expression of the trace-norm risk but we can formulate a lower bound based on the pinching inequality √ N ρ PLS − ρ r 1 ≥ Ã 1 + C 1 Note thatÃ = A−2 √ d − r I r , and the variance of the elements of A is of the order 1. Therefore, for r d the shift 2 √ d − r dominates the eigenvalues of A andÃ is a negative matrix. In first approximation its norm-one is then 2r √ d − r . On the other hand,C is positive and its trace can be approximated as where in last step we used equation (28) with a = 0. Therefore the lower bound to the tracenorm error is 4r √ d − r . For the n = 7 atoms state with r = 10 the trace-norm error was 0.33 while the lower bound is 0.21. For rank r = 1 state of 8 atoms with N = 10 5 samples, the operator-norm error was 0.24, while the lower bound is 0.16.
Bures risk. The Bures distance error can be expressed in terms of the blocksÃ,B,C as follows where we used the Taylor expansion in the last step. The leading term of the Bures risk is then where we used the fact that the LS block A has centred distribution. The second order term is The simulation results for all ranks states of n = 5 and n = 6 atoms are presented panels c) and respectively d) of Figure 3.

Comparative numerical simulations
In this section we detail the methodology and results of a general simulation study which compares the performance of the estimators presented in the previous sections for a set of states and against several estimation criteria. The states we consider are rank-r states with a fixed spectrum of r equal eigenvalues of magnitude 1/r each, and have randomly generated eigenvectors. This choice is motivated by the fact that such states are arguably harder to estimate among rank-r states (in analogy to the fact that an unbiased coin is harder to estimate than a biased one.
Additionally, having such a spectrum allows for a more consistent comparison of the estimators across several ranks.
We generate the above mentioned states for 3 or 4 qubits, and for a particular 'true state' we simulate a dataset D of counts from which the state is to be reconstructed. The outcome statistics depend on a few variables that we may vary, namely the type of measurement design (random basis vs Pauli), the number of repetitions per settings m, and in the case of the random basis measurements the total number k of measured settings. This allows us to study the performance of the estimators across several different combinations of variables: types of states, ranks, measurement design, number of repetitions per setting m, the total number of settings k and the number of ions n.
We will present plots of the estimated mean errors E [D(ρ, ρ)] of the estimators for the equal eigenvalues states. For each given rank r and number of qubits n, we generate a state with equal eigenvalues 1 r , . . .  Table 1, and the corresponding mean errors are estimated from 100 different runs of the experiment.
In order to make the results of the simulation study more accessible, we have made all plots for 3 and 4 atoms simulation available online via an interactive Rshiny application at this address: https://rudhacharya.shinyapps.io/plots/, while a selection is presented in the paper. In both cases we note that the Frobenius risk of the LS estimator has no significant dependence on the rank of the true state, and its performance is poor for small rank states. In contrast, the remaining estimators all show a scaling of the Frobenius risk with the rank of the true state. We also note that the performance of several of the estimators matches well with the Fisher-predicted risk.

Squared Frobenius norm
This is remarkable as the latter (6)  . This reflects the fact that in the low N case the asymptotic regime has not been reached and constrained estimators have an advantage even for full rank states which lie in the interior of the parameter space. Indeed for an eigenvalue λ of order 1/d and relatively small values of N , the unconstrained estimateŝ λ will have a standard deviation of order d/N which may be comparable or larger than the magnitude of the eigenvalues themselves. Therefore without the constraint of positivity such estimators may produce estimates withλ < 0. In contrast, for the constrained estimators the positivity constraint provides additional information when the eigenvalues are small. For large values of N however, the uncertainty in the eigenvalues is very small and the Fisher risk acts as a lower bound for all of the estimators.
Across both the Pauli and the random measurement designs we note that the performance of the posGLS and the ML estimators is very similar, and for large m almost identical. This confirms our asymptotic analysis which shows that the posGLS and the ML estimators are equivalent in the limit of large m, cf. section 7.2.

Bures and Hellinger distance
As the Bures distance D B (ρ, ρ) 2 is well defined only over density matrices, we plot the mean Bures errors only for the ML, TLS, TGLS, posLS , posGLS estimators. Figures 6 and 7 show the mean Bures errors for different sample sizes in the Pauli, and respectively random bases measurement design. For comparison, in Figures 8 and 9 we also plot the corresponding average Hellinger errors D H (λ, λ) 2 , cf. Table 1.
We note that the behaviour of the Hellinger errors is very similar to that of the Bures errors, with a better match for larger values of N . To give some intuition about this, we look at what happens in the case of qubits. The Bures distance between neighbouring qubit states can be a) n = 4, k = 100 and m = 100 b) n = 4, k = 100 and m = 1000 where Φ is the angle between the Bloch vectors of ρ andρ. In a non-adaptive measurement scenario such as those considered here, the Bloch vector parameters can be estimated at rate 1/ √ N which means that the second term on the right side of (31) is always of the order 1/N . However, for states which are very pure (λ ≈ 0) the Hellinger component has the dominant contribution to the Bures distance, and is responsible for the "non-standard" scaling of 1/ √ N in the minimax risk [4]. The simulation results indicate that a similar phenomenon may occur in higher dimensional systems. For full rank states, both the Bures and the Hellinger distance have a quadratic expansion, and in this sense a relation similar to (31) can be derived by splitting the Bures distance into quadratic contributions coming from changes in the eigenvalues and small basis rotations respectively, see also [61]. Alternatively, the block-matrix techniques used for analysing the Bures risk in section 8.1 can be extended to rank deficient states of arbitrary spectrum to show that the leading 1/ √ N contribution comes from the Hellinger D H (λ,λ) 2 .
Another noticeable feature across both the Pauli and the random bases measurement design is that for large N the mean errors are seen to be larger for states of middling ranks than for the full rank states, see Figures 6b, 7b. This is however not true for smaller values of N , as shown in Figures 6a, 7a. More precisely, for large N we see a steep increase from pure states to low rank states, followed by an almost linear decrease down to the full rank state. A similar behaviour has been uncovered in the analysis of the PLS estimator for covariant measurements in section 8.1. There, we found that for large d and low rank r, the eigenvalues distribution of the error block C of the LS estimator converges to the Wigner distribution, which allows us to compute the leading orders of the Bures risk of the PLS estimator. Since the covariant measurement arises in the large m limit of the random basis measurement [3], it is therefore expected that the risks behave similarly in the two cases. Our simulations indicate that the mechanism governing the asymptotics of the Bures risk seems to be robust with respect to the details of the measurement; although the Pauli basis measurement is not expected to produce an LS estimator whose error matrix is Wigner distributed, the Bures and Hellinger risks have similar characteristics as those of the covariant measurement. These findings are in line with those of [66], and are worthy of further theoretical investigation. a) n = 4, k = 81 and m = 100 b) n = 4, k = 81 and m = 1000 While the Bures risk for rank deficient states scales as 1/ √ N , for full rank states the Bures distance is locally quadratic, and standard asymptotic results imply that the convergence rate is in this case 1/N , cf. section 3.1. In general, D B (ρ θ , ρ θ+δθ ) = (δθ) T G B (θ)(δθ) + O( δθ 2 ), with weight matrix G B (θ) = F (ρ θ )/4, where F (ρ θ ) is the quantum Fisher information. For the maximally mixed state ρ r=d , and the parametrisation (4), the latter has 2 uncorrelated blocks [3] corresponding to off-diagonal parameters and respectively diagonal parameters On the other hand, the classical average Fisher information for random basis measurements is given by I = F/(d + 1) [17]. Since the ML estimator is asymptotically normal with variance

Trace-norm distance
The risks for the trace-norm distance exhibit a similar behaviour to those of the Frobenius distance, as illustrated in Figures 10 and 11. A noticeable feature is that all constrained estimators have close risks for large sample sizes in both the Pauli and the random bases setting.

Conclusions
In this paper we studied theoretical an practical aspects of quantum tomography methods. The unifying theme is that each estimator can be seen as a projection of the data onto a parameter space with respect to an appropriate metric. We considered estimators without positivity constraints (unconstrained maximum likelihood, least squares, generalised least squares) and with positivity constraints (maximum likelihood, positive least squares, thresholded least squares and projected least squares), and investigated the relationships between different estimators. While no estimator makes use of the state's rank (which is assumed to be unknown) the constrained estimators have significantly lower errors than the unconstrained estimators, for low rank states.
To better understand this behaviour we derived new asymptotic error rates for the least squares estimator and for the projected least squares estimators, for a class of given rank states and covariant measurements. These results capture the exact rate dependence on rank and dimension and complement non-asymptotic concentration bounds of [17,72,36], showing that PLS has strong optimality properties; for instance the leading contribution to the Frobenius risk is 6rd which is 'almost optimal', in that it is only 3 times larger than the minimax lower bound established in [17], which assumes that the rank is known. Our analysis has uncovered an interesting behaviour of the Bures error, which increases with respect to rank for small ranks and then decreases towards full rank states. The extensive simulations study shows that this behaviour (as well as the monotonic increase of other errors) is robust with respect to the measurement design.
Computationally, maximum likelihood and positive least squares involve an optimisation over states and are significantly slower than projected least squares which only requires the diagonalisation of the least squares matrix followed by a simple truncation procedure. Our results confirm and strengthen those of [36] and show that projected least squares is an attractive alternative to maximum likelihood, which is routinely used in practice. Additional improvements can be achieved by using generalised least squares as starting point, in a two steps procedure.
An interesting and practically relevant open question is whether any of the 'fast' estimators analysed here is statistically 'optimal' for more realistic measurements such as the Pauli bases. More generally, one can ask whether these methods can be adapted to non-informationally complete measurement scenarios, and other physically motivated lower dimensional statistical models.
• We choose a vector of constants C forming a mesh over the interval [0, 1]. For each value of C, and for each j ∈ {1, . . . , 5} we compute the following estimators. We hold out the dataset D j , and compute the TLS/TGLS estimateρ −j T(G)LS (C) for the dataset D −j = i =j D i , with threshold ν = C 4 m log 2 n+1 , cf. [17]. For each D j the LS estimateρ j LS is also evaluated.
• For all values of C, the empirical discrepancy is evaluated for a choice of error function D(ρ, ρ) CV D (C) = 1 5 Notice that the cross-validation procedure picks different constants for different choices of the error function. An important caveat here is that the Bures distance is not defined for the LS estimatesρ j LS , and therefore the procedure above cannot apply. Instead in the simulations we estimate the thresholding constantĈ D B using the ML estimate aŝ