SC-GAN: 3D self-attention conditional GAN with spectral normalization for multi-modal neuroimaging synthesis

Task: Multimodal (PET & multi-modal MR) 3D medical image synthesis

Description: A novel 3D conditional generative adversarial network (GAN) designed and optimized for the application of multimodal 3D neuroimaging synthesis. The model uses spectral normalization and feature matching to stabilize the training process and ensure optimization convergence (SC-GAN). A self-attention module was also added to model the relationships between widely separated image voxels.

The model is constructed as follows: First, a 2D conditional GAN is extended into a 3D conditional GAN (cGAN). Next, a 3D self-attention module is added to generate 3D images with preserved brain structure and reduced blurriness in the synthesized images. This module models the similarity between adjacent and widely separated voxels of a 3D image. Spectral normalization, feature matching loss and brain area RMS error (RMSE) loss are also introduced to stabilize the network training process and prevent overfitting.

The novel features of the SC-GAN model is:

  1. introduction of a 3D self-attention module into 3D conditional GAN to generate high-accuracy synthesis results with stable training process. A smooth training is then achieved by using of a series of stabilization techniques and a modified loss function; and
  2. testing on multiple data sets across different synthesis tasks and enabling of multimodal input, which can be generalized to a wide range of image synthesis applications.
Schematic view of the self-attention conditional generative adversarial network (SC-GAN). The first layer represents the input data. The attention map exploits the similarity of each pair of convolved images and combines it with the input data to generate the output of the self-attention module. Abbreviation: conv, convolution (obtained from model’s reference publication)

Validation: The performance of the network was evaluated on the ADNI-3 dataset, in which the proposed network was used to predict amyloid beta PET (Aβ-PET) images, fractional anisotropy (FA), and mean diffusivity (MD) maps from multimodal MRI. Then, SC-GAN was applied on a multidimensional diffusion MRI experiment for super-resolution application. Experiment results were evaluated by normalized RMS error, peak SNR, and structural similarity. Three image-quality metrics were used to evaluate the performance of the synthesis task: normalized RMSE (NRMSE), peak SNR, and structural similarity.

Results: In general, SC-GAN outperformed other state-of-the-art GAN networks including 3D conditional GAN in all three tasks across all evaluation metrics. Prediction error of the SC-GAN was 18%, 24% and 29% lower compared to 2D conditional GAN for fractional anisotropy, PET and mean diffusivity tasks, respectively. The ablation experiment showed that the major contributors to the improved performance of SC-GAN are the adversarial learning and the self-attention module, followed by the spectral normalization module. In the super-resolution multidimensional diffusion experiment, SC-GAN provided superior predication in comparison to 3D Unet and 3D conditional GAN.

Qualitative assessment of three tasks. Images are the result of applying different GANs on T1w and FLAIR input images to predict Aβ-PET (A), FA (B), MD (C), and absolute value error maps between synthesis results and target for FA (D) and MD (E) tasks. Target PET/FA/MD are also illustrated for comparison. The SC-GAN was able to synthesize the most similar results in comparison with other networks. In the FA task (B), a 2D network demonstrated continuous distortion (red arrow), and 3D cGAN resulted in an oversmoothed image (blue arrow showing partial-volume effect between fiber bundles of cingulum and corpus callosum). Attention cGAN failed to capture high-intensity FA across the white matter (yellow arrows). Green dotted circle shows that, unlike other networks, SC-GAN was able to capture brainstem details. In the MD task (C), 2D generated artificial sharp boundaries (red arrow) and 3D cGAN resulted in a large amount of striping artifact (blue arrow). High error rates at brain boundaries are related to the regions affected by the EPI distortion and caused by imperfect brain masks used in the experiment. The figure shows that the result generated by SC-GAN has the lowest intensity error compared with other networks’ results (obtained from model’s reference publication).

Claim: The SC-GAN network is an end-to-end medical image synthesis network that can be applied to high-resolution, multimodal input images. The SC-GAN was designed to handle multimodal (multichannel) 3D images as inputs. It was shown that SC-GAN significantly outperformed state-of-the-art techniques, enabling reliable and robust deep learning–based medical image synthesis for a wide range of applications.

The high computational cost is the main limitation of SC-GAN; it requires lengthy training and heavy computational resources, such as GPU memory. Sparse attention matrix computation could be a potential solution. A future research direction could focus on generalizing SC-GAN, such as by adopting knowledge distillation mechanism into SC-GAN.

Data Availability: The Alzheimer’s Disease Neuroimaging Initiative ADNI-3 dataset was used. The tools and codes used in this project are mostly available through the Laboratory of Neuroimaging (LONI) pipeline. The FreeSurfer and FSL open toolkits were employed for preprocessing and data preparation. The proposed SC-GAN deep learning network is also made publicly available.

GitHub Page

Documentation

Reference Publication

Leave a comment