Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Tuesday, 19 January 2021

BLOOD CANCER DETECTION USING CNN – AI PROJECTS

INTRODUCTION

Blood consists of plasma, and three different types of cells and they are: White Blood Cells, Red Blood Cells and Platelets and each of these performs particular task. Red blood cells transport oxygen from the lungs to the tissues of the body and vice versa. White blood cells help the body to fight against diseases and infections. Platelets help to clot and control bleeding. Leukemia is cancer of blood cells in which number of white cells is increases numerously and those are immature cells that interfere with other blood cells, usually red blood cells and platelets. Our body’s white blood cell ratio is 1000:1. It means that between 1000 red blood cells there is 1 white blood cell.

There are two types of white blood cells that get turn into leukemia and they are:

Lymphoid cells
Myeloid cells

Leukemia that caused due to lymphoid cells is called lymphocytic or lymphoblastic leukemia and if it is caused due to myeloid cells then it is known as myelogenous or myeloid leukemia. Leukemia is grouped in two ways: acute or chronic, grouped according to how fast the cells are growing. The abnormal blood cells in acute leukemia are usually immature blasts (young cells) that do not work properly. These cells are growing fast. Acute leukemia gets worse quickly unless it is immediately treated. Young blood cells are present in chronic leukemia, but also mature functional cells are produced. Blasts are growing slowly in chronic leukemia. It takes the disease longer to get worse.

The four major forms of leukemia are:

Acute lymphoblastic leukemia (ALL)
Acute myelogenous leukemia (AML)
Chronic lymphocytic leukemia (CLL) and
Chronic myelogenous leukemia (CML)

PROBLEM DEFINITIONS

According to the Leukemia & Lymphoma Society, one person in the U.S. is diagnosed with blood cancer approximately every 3 minutes and an estimated total of 174,250 people in the U.S. are expected to be diagnosed with leukemia, lymphoma or myeloma in 2018. The estimated new cases in 2019 are around 61,780 and according to the National Cancer Institute, the percentage of all new cancer cases is 3.5 percent. As in acute leukemia, if the treatment is not done in a precise time, the person died within a few months. And it is very necessary to detect cancer in the early stages to treat this type of cancer or any type of cancer. It takes more time and effort to do the detection process by technicians manually and it costs more with the help of the instrument.

PURPOSE

The purpose of our project is to develop a system that can automatically detect cancer from the blood cell images. This system uses a convolution network that inputs a blood cell images and outputs whether the cell is infected with cancer or not. The appearance of cancer in blood cell images is often vague, can overlap with other diagnoses, and can mimic many other benign abnormalities. These discrepancies cause considerable variability among medical personnel in the diagnosis of cancer. Automated detection of cancer from blood cell images at the level of expert medical personnel would not only have tremendous benefit in clinical settings, it would also be invaluable in delivery of health care to populations with inadequate access to diagnostic imaging specialists.

SCOPE AND APPLICATION

We develop a system which detects cancer from blood cell images at a level exceeding practicing medical personnel. This technology can improve healthcare delivery and increase access to medical imaging expertise in parts of the world where access to skilled medical personnel is limited.

LITERATURE REVIEW

Various techniques have been developed by researchers to detect leukemia. One of the most used technique is Convolution Neural Network (CNN) It is based on computer vision in recent years. The common algorithm for this approach consists of several rigid steps: image pre-processing, clustering, morphological filtering, segmentation, feature selection or extraction, classification, and evaluation.

EXISTING METHODS FOR DIAGNOSIS

• Medical history and physical examination: The record of present symptoms, and problems a person has had in the past. The medical history of a person’s family also helps in diagnose leukemia.

Complete blood count (CBC): Blood is taken and checked under the microscope for the number of RBCs, WBCs and platelets.

Bone marrow aspiration: Bone marrow is removed with the help of a needle from breastbone. The removed sample is observed under a microscope to look for abnormal cells.

Cytogenetic analysis: Cytogenetic test takes blood or bone marrow to help identify individual chromosomes. It shows abnormalities in chromosomes, which help to diagnosis and identify the type of leukemia. Results are usually available within 3 weeks.

Immunohistochemistry: Blood sample of cells are treated with special antibodies in immunohistochemistry. Under the microscope the change in color can be seen. It helps in determining the types of cells that are present.

CNN TECHNIQUES

have proposed the segmentation method using color-based clustering to obtain nucleus region and cytoplasm area from stained blood smear images. SVM classifiers are applied with relevant features and gain satisfactory results.
have proposed an automatic detection of white blood cells (WBCs) from peripheral blood images and classification of five types of WBCs: eosinophil, basophil, neutrophil, monocyte, and lymphocyte. Eosinophil and basophil from other WBCs are first classified by SVM with a granularity feature. Other three types are then recognized using convolutional neural network to extract features, and random forest uses these features to classify those WBCs

DATASET

Image used in this project were obtained from Kaggle dataset which is a public dataset available online [9]. This dataset was divided into 2 classes. There was total 4961 training images where 2483 images were from healthy patients and 2478 images were from patients affected with blood cancer. We tested the model with total 1240 images 620 from each class. These images had resolution of 320*240.

CNN OVER OTHER ALGORITHMS

There are a lot of algorithms that people used for image classification before CNN became popular. People used to create features from images and then feed those features into some classification algorithm like SVM. Some algorithm also used the pixel level values of images as a feature vector too. To give an example, you could train a SVM with 784 features where each feature is the pixel value for a 28×28 image.

CNNs can be thought of automatic feature extractors from the image. While if we use a algorithm with pixel vector we lose a lot of spatial interaction between pixels, a CNN effectively uses adjacent pixel information to effectively down sample the image first by convolution and then uses a prediction layer at the end.

This concept was first presented by Yann le cun in 1998 for digit classification where he used a single convolution layer. It was later popularized by Alex net in 2012 which used multiple convolution layers to achieve state of the art on image net. Thus, making them an algorithm of choice for image classification challenges henceforth.

WORKING OF CNN

We have implemented CNN for the feature extraction and classification of the blood samples.

A CNN is a multilayered neural network with a special architecture to detect complex features in data. CNNs have been used in image recognition, powering vision in robots, text in images and for self-driving vehicles.

The CNN consist layer of neurons and it is optimized for two-dimensional pattern recognition. CNN has three types of layer namely convolutional layer, pooling layer and fully connected layer. Our network consists of 11 layers excluding the input layer. The input layer takes in a RGB color image where each color channel is processed separately.

The first 6 layers of convolution network are convolution layer. First 2 convolution layer applies 16 of 3*3 filters to an image in the layer. The other two layer applies 32 of 3*3 filters to an image. And the last 2 layers of convolution applies 64 of 3*3 filters to an image. The nonlinear transformation sublayer employs the ReLU activation function. The max pooling sublayer applies a 2*2 filter to the image which results in reducing the image size to its half. At this point, convolution network extracts 64 features, each represented by a 32*32 array for each color channel.

The eighth layer is the flatten layer. The flatten layer transforms a multidimensional array into one-dimensional array by simply concatenating the entries of the multidimensional array together. The output of this flatten layer is a one-dimensional array of size 4800. The ninth layer is the fully connected ANN with the ReLU activation function that maps 4800 input values to the 64 output values. The tenth layer is the dropout layer. 50% of the input values coming to the layer are dropped to zero to reduce the problem of overfitting. The eleventh and the final layer is a fully connected ANN with the sigmoid activation function that maps 64 input values to 2 class labels.

First, we train convolution network using the data in training set to find appropriated filters’ weights in the three convolutional sublayers and the weights that yield minimum error in the two fully connected layers. Next, we evaluate convolution network using the data in the validation set to obtain validation error and cross-entropy loss. We repeat the training of convolution network in this same procedure until we complete 10 epochs. Last, we evaluate the performance of convolution network using data in the test set.

CLASSIFICATION

Neural networks are used in the automatic detection of cancer in blood samples. Neural network is chosen as a classification tool due to its well-known technique as a successful classifier for many real applications. The training and validation processes are among the important steps in developing an accurate process model using CNNs. The dataset for training and validation processes consists of two parts; the training features set which are used to train the CNN model; whilst a testing features sets are used to verify the accuracy of the trained using the feed- forward back propagation network. In the training part, connection weights were always updated until they reached the defined iteration Number or suitable error. Neural networks are used in the automatic detection of cancer in blood samples. Neural network is chosen as a classification tool due to its well-known technique as a successful classifier for many real applications. The training and validation processes are among the important steps in developing an accurate process model using CNNs.

RESULTS AND ANALYSIS

The final output of our project is to detect the cancer accurately with the help of iterations obtained, loss and accuracy graph and the confusion matrix.

We have performed twenty series of iteration from which we can clearly observe that the loss is decreasing with each iteration. Loss is about how much right the model is. So we wanted to minimize the loss function and as a result our model has perfectly declined the loss value straight from starting point and at every iteration we get closer to minimum.

Next we performed loss and accuracy curve for the best result of our model. These learning curves (loss and accuracy curve) shows the performance of our model on training and validation set as a function of number of training iterations.

We have loss curve which is decreasing with each iteration which shows that loss is minimizing giving the best result. On the other hand, we have performed accuracy curve which is increasing with each iteration that means our model is getting better and better at learning.

Here the 0 belongs to class of people not having cancer and 1 belongs to class of people having cancer. The confusion matrix consists of True positive, True negative, False positive and False positive values according to which different parameters are calculated which is shown in figure below:

Here, the recall is most significant quantity even more than accuracy and precision. Since we are having unequal number of people in both the classes, therefore we can’t take accuracy as an alone metric to calculate model accuracy. Also, we have to minimize the false negative which is in the denominator of recall increasing the value for recall.

False negative has to be intuitively minimized because falsely diagnosing a patient of Cancer as not having Cancer is much larger deal than falsely diagnosing a healthy person as a Cancer patient which is our major concern. That is why we are making this model to reduce the mistakes done by doctors accidentally.