COVID-19 X-Ray Feature Inference using CNNs

COVID-19 X-Ray Feature Inference using CNNs

Aim and Objectives

The main aim of this project is to implement a Convolutional Neural Network to classify X-Ray images in order to determine if a given radiograph is said to have pneumonia or not. Due to the ambiguous nature of Neural Networks, it’s not fully understood the reasons behind why they choose to classify images in a certain way. Because of this, I’ve decided to analyse the model outputs at different depths of the network. Further to this, it would be advantaegous to gain some insight into the features that the model is learning, in order to visualise what may constitute a diagnosis of pneumonia. This will be done by visualising both the input images and the respective ‘learned’ features by examining the respective Convolutional Layer outputs.

Specific Objective(s)

The Dataset

The dataset I have chosen to use is a collection of grayscale peadiatric X-Ray images, consisting of two categories: ‘pneumonia’ and ‘normal’. There are a total of 5856 jpeg images, with pre-allocated folder labels and train-test-validation splits.

There is good variation within the respective subsets, whereby both anterior and posterior images are included. The images were selected from retrospective cohorts of pediatric patients, agin between one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou.

In terms of image accuracy and quality, all images were initially screened by physicians and any low quality or unreadable / ambiguous scans were removed. The images were then expertly classified by leading physicians before being assigned their respective labels.

There are a few small issues with the data that may pose issues later during training; there is a significant class imbalance. In total, there are 1583 images classed as ‘normal’, in contrast to 4273 images classed as ‘pneumonia’, this equates to there being 2.7 times more images of the ‘pneumonia’ class than ‘normal’ class. It is not yet clear as to whether this will impact performance, but this will need to be considered prior to training our network.

Another minor issue is regarding the image sizes, each image is of varying size, dependent on whether the images were taken anteriorly or posteriorly, which leads to slight variance in the aspect ratio. Because of this, I may need to consider an algorithm to determine ‘regions of interest’ in the images, prior to training. This will also need to be considered when pre-processing the data, but due to the uniform nature of X-Ray images in general, this shouldn’t be an issue if we’re required to crop or downsample.

The dataset itself is derived originally from this paper, but is downloaded from this Kaggle repository.

Network Architecture

The architecture of the bulk of the project revolves around the implementation of a Convolutional Neural Network. In the Program Code section below, you will see more detail regarding this architecture, but the network structure is based on that of the MiniVGGNet architecture:

network_image

This network tends to perform well on a variety of different image classification tasks, and will hopefully provide a good basis for our network to be built upon. The use of convolutional layers here is imperative, as these allow us to convolve image filters known as kernels over the image, in order to extract features and information. The combination of this, along with the max pooling layers, will allow the most common features to be carried through the network, for which the model will be able to infer further patterns / structures.

Processing Modules and Algorithms

The bulk of the processing comes from cleaning and formatting the dataset itself, this will involve:

Code Repository

The code is available as a Jupyter Notebook on my GitHub here.

Credits

Virus Icon Header made by SmashIcons.

Michele Pascale

Michele Pascale

PhD Student in Mathematics @ Queen Mary University of London

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora