Dual Adaptive Pyramid Network for Cross-Stain Histopathology Image Segmentation

Supervised semantic segmentation normally assumes the test data being in a similar data domain as the training data. However, in practice, the domain mismatch between the training and unseen data could lead to a significant performance drop. Obtaining accurate pixel-wise label for images in different domains is tedious and labor intensive, especially for histopathology images. In this paper, we propose a dual adaptive pyramid network (DAPNet) for histopathological gland segmentation adapting from one stain domain to another. We tackle the domain adaptation problem on two levels: 1) the image-level considers the differences of image color and style; 2) the feature-level addresses the spatial inconsistency between two domains. The two components are implemented as domain classifiers with adversarial training. We evaluate our new approach using two gland segmentation datasets with H&E and DAB-H stains respectively. The extensive experiments and ablation study demonstrate the effectiveness of our approach on the domain adaptive segmentation task. We show that the proposed approach performs favorably against other state-of-the-art methods.


Introduction
Deep convolutional neural networks (DCNNs) have achieved remarkable success in the field of medical image segmentation [5], which aims to identify and segment specific regions, such as organs or lesions in MR images, and cellular structures or tumor regions in pathological images. Although excellent performance has been achieved on benchmark dataset, deep segmentation models have poor generalization capability to unseen datasets [10] due to the domain shift between the training and test data. Such domain shift is commonly observed especially in histopathology image analysis. For instance, the Hematoxylin and Eosin (H&E) stained colon image has significantly different visual appearances from that stained by Diaminobenzidene and Hematoxylin (DAB-H) (Fig. 1). Thus, the model trained on one (source) dataset would not generalize well when applied to the other (target) dataset. Although fine-tuning the model with labelled target data could possibly alleviate the impact of domain shift, manually annotating is a time-consuming, expensive and subjective process in medical area. Therefore, it is of great interest to develop algorithms to adapt segmentation models from a source domain to a visually different target domain without requiring additional labels in the target domain.
Domain adaptation algorithms have been developed to address the domainshift problem. The main insight behind these methods is trying to align visual appearance or feature distribution between the source and target domains. Zhang et al. [11] render the source image with the target domain "style", and then learn domain-invariant representations in an adversarial manner. AdapSeg [9] is developed to align the two domain images in the structured output space. CyCADA [3] unifies adversarial adaptation methods together with cycle-consistent image translation techniques.
In this paper, we propose a DCNN-based domain adaptation algorithm for histopathology image segmentation, referred to as Dual Adaptive Pyramid Network (DAPNet). The proposed DAPNet is designed to reduce the discrepancy between two domains by incorporating two domain adaptation components on image level and feature level. The image-level adaptation considers the overall difference between source and target domain like image color and style, while feature-level adaptation addresses the spatial inconsistency of the two domains. In particular, each component is implemented as a domain classifier with an adversarial training strategy to learn domain-invariant features.
The contribution of this work can be summarized as follows. First, we develop a deep unsupervised domain adaptation algorithm for histopathology image segmentation. Second, we propose two domain adaptation components to alleviate the domain discrepancy at the image and feature levels based on pyramid features. Third, we conduct extensive experiments and our proposed DAPNet outperforms other state-of-the-art methods.

Method
In this work, we aim to learn gland segmentation model from images with a certain stain type and apply the learned model to a different stain scenario. The training data is used as the source domain S while the test data with a different stain type is regarded as the target domain T . In the S domain, we have access to the stained images X S as well as the corresponding ground-truth labels Y S . In the target domain T , we only have the unlabelled stained images X T .

Model Overview
The overview of the proposed DAPNet is illustrated in Fig. 2. It contains a semantic segmentation network G and two adversarial learning modules D img and D f eat . During training, both the source images x s and target images x t are fed into the network G as inputs. The source images and the corresponding labels are used to optimize G for the segmentation task, while both source and target images are used for optimizing domain adaptation losses by adversarial learning with D img and D f eat .

Segmentation Network
As shown in Fig. 2, our segmentation network consists of 3 components. First a dilated ResNet-18 [2] is used as backbone to encode the input images. In order to achieve larger receptive field of our model, we apply a Pyramid Pooling Module (PPM) from PSPNet [12] on the last layer of the backbone network.
The PPM separates the feature map into different pooled representations with varied pyramid levels. The different levels of features are then upsampled and concatenated as the pyramid pooling global feature. Furthermore, we adopt skip connections from U-Net [7] and a pyramid feature fusion architecture to achieve final segmentation. The decoded feature maps are upsampled to the same spatial resolution and merged by concatenation in a pyramidal way. The output feature maps undergo a 1 × 1 convolutional layer to reduce the dimension of channel to 512. Our method involves downsampling pyramid feature extraction and upsampling pyramid feature fusion. However, the CyCADA needs to first map source training data into the target domain in pixel level. The segmentation task is learned by minimizing both standard cross-entropy loss and Dice coefficient for images from the source domain: where y s stands for ground-truth labels, y s stands for predicted labels and α is the trade-off parameter.

Domain Adaptation
Image-level Adaptation. In this work, image-level representation refers to the PPM outputs of the segmentation network G. Image-level adaptation helps to reduce the shift by the global image difference such as image color and image style between the source and target domains. To eliminate the domain distribution mismatch, we employ a discriminator D img to distinguish PPM features between source images and target images. At the same time, D img also guides the training of segmentation network in an adversarial manner. In particular, we employ PatchGAN [4], a fully convolutional neural operating on image patches, from which we can get a two-dimensional feature map as the discriminator outputs. The loss for training D img is formulated as follows: where p s and p t denote the PPM outputs of the segmentation network G for source domain and target domain.
Feature-level Adaptation. The feature-level representation refers to the fused feature maps before feeding into the final segmentation classifier. Aligning the feature-level representations helps to reduce the segmentation differences in both global layout and local context. Similar to image-level adaptation, we also train a domain classifier D f eat formulated as a PatchGAN to align the featurelevel distribution. Let us denote the final fused feature representation as f s and f t for source domain and target domain respectively. The loss for D f eat is written as follows:

Overall Training Objective
We integrate the segmentation module for source images and the two domain adaptation modules to train all the networks G, D img and D f eat jointly. The overall objective function can be formulated as follows: where λ 1 and λ 2 are two trade-off parameters. The min-max game is optimized by adversarial training and G is used to achieve segmentation for images in target domain during test.

Implementation details
Our DAPNet employs 3 × 3 kernel for convolutional operations followed by a batch normalization layer. We train all the models using Adam optimization with a batch size of 4 for 300 epochs. We randomly crop image patches of size 256 × 256 for training. The initial learning rate is 10 −3 , which is kept the same for the first 150 epochs and linearly decayed to zero over the next 150 epochs. The hyper-parameters α, λ 1 and λ 2 are set to 1, 0.002 and 0.005 respectively. Our method is based on LSGAN [6], which replaces the negative log likelihood objective by a least square loss. This loss achieves a more stable model training and generates higher quality results.

Results
We evaluate the performance of our DAPNet for gland segmentation in both adaptive directions. In particular, we denote Warwick-QU (source) to GlandVision (target) as Warwick-QU → GlandVision and vice versa, and the test images in the target domain are used for evaluation. Extensive experiments including comparisons to the state-of-the-art methods and ablation study are provided.
We compare our DAPNet with three state-of-the-art unsupervised domain adaptation methods: CycleGAN [13], CyCADA [3] and AdaptSeg [9]. The comparison with CycleGAN is achieved by two stages. We first use CycleGAN transforms the source domain images to target domain, and then use the transformed images along with the corresponding label in the source domain to train the segmentation network G. We report the segmentation results using Pixel Accuracy (Acc.) and the Intersection over Union (IoU) in Table 1. We can observe that our model DAPNet outperforms all the other methods for domain adaptation between WarwickQU and GlandVision in both directions. We have repeated the model training and testing for 3 times with random parameter initializations and the same hyper-parameters. All tests have shown that our proposed method consistently outperforms other methods with statistical significance (paired ttest with p<0.01). Specifically, when adapting from Warwick-QU to GlandVision, the averaged accuracy and IoU are 0.88 ± 0.0083 (Mean ± SD) and 0.68 ± 0.0021 respectively. On the other hand, the averaged accuracy and IoU are 0.76 ± 0.0105 and 0.57 ± 0.0108 respectively adapting from GlandVision to Warwick-QU. Moreover, Fig. 3 presents qualitative results of two example im-  ages for each of the domain adaptation case. Both CycleGAN and CyCADA can successfully detect the gland structures, but the predicted masks contain irregular spot noise. AdaptSeg with only image-level adaptation can hardly segment the gland boundaries clearly. Our proposed DAPNet produces significantly better predictions with accurate layout.
We further conduct ablation study to demonstrate the necessity of the two domain adaptation components of our model. In particular, we compare DAP-Net with its three variants, the model trained without domain adaptation modules (DAPNet-NA), only image-level adaptation module (DAPNet-IA) and only feature-level adaptation module (DAPNet-FA). As shown in Table 1, we observe that the performance of the DAPNet-NA drops significantly due to the domain shift and the best results are achieved with DAPNet. It is clear that the two adaptation components can effectively alleviate the discrepancy between two domains. We also show that domain adaptation modules can boosts the segmentation performance on target domain without affecting the results on source domain (see Fig. 4).

Conclusions
In this paper, we study the unsupervised domain adaptive segmentation task for histopathology images. We have proposed a dual adaptive pyramid network with two domain adaptation components by adversarial training on both image and feature levels. The model is trained without target domain labels and the test procedure works as normal segmentation networks. Experimental results show that the proposed DAPNet can effectively boost the performance on unlabelled target datasets, and outperform other state-of-the-art approaches.