Abstract: Data augmentation is an essential part of training discriminative Convolutional Neural Networks (CNNs). A variety of augmentation strategies, including horizontal flips, random crops, and principal component analysis (PCA), have been proposed and shown to capture important characteristics of natural images. However, while data augmentation has been commonly used for deep learning in medical imaging, little work has been done to determine which augmentation strategies best capture medical image statistics, leading to more discriminative models. This work compares augmentation strategies and shows that the extent to which an augmented training set retains properties of the original medical images determines model performance. Specifically, augmentation strategies such as flips and gaussian filters lead to validation accuracies of $84\%$ and $88\%$, respectively. On the other hand, a less effective strategy such as adding noise leads to a significantly worse validation accuracy of $66\%$. Finally, we show that the augmentation affects mass generation.
Learning Objective 1: Evaluate data augmentation techniques as they pertain to training classifiers for medical imaging classification tasks, particularly by studying performance of deep neural networks on classifying mass/non-mass on mammograms.
Zeshan Hussain (Presenter)
Francisco Gimenez, Stanford University
Darvin Yi, Stanford University
Daniel Rubin, Stanford University