Analysis of Machine Learning and Statistics Tool Box (Matlab R2016) over Novel Benchmark Cervical Cancer Database

Uterine Cervix Cancer is one of the leading Cancer names effecting the female population worldwide [1] [2]. Incidence of Cervical Cancer can be reduced by 80% through a routine Pap smear test. Pap smear test requires skilled cytologists and is always prone inaccurate and inconsistent diagnosis due to manual error. Automated systems for easy recognition and proper staging of the cancerous cells can assists the medical professionals in correct diagnosis and planning of the proper treatment modality [3]. In research 23 well-known machine learning algorithms available in MatlabR2016 are extensively analyzed for their classification potential of Pap smear cases. To Train and Test the algorithms a huge database is created containing 8091 cervical cell imag pertaining to 200 clinical cases collected from three medical institutes of northern India. The raw cases of cervical cancer in form of Pap smear slides were photographed under a multi-headed digital microscope. After profiling the cells were vigilantly assigned classes by multiple cytotechnicians and histopathologists [4]. Cervical cases have seven classes of diagnosis [4].Quadratic SVM performed best among the 23 algorithms applied.

Uterine Cervix Cancer is one of the leading Cancer names effecting the female population worldwide [1] [2]. Incidence of Cervical Cancer can be reduced by 80% through a routine Pap smear test. Pap smear test requires skilled cytologists and is always prone to inaccurate and inconsistent diagnosis due to manual error. Automated systems for easy recognition and proper staging of the cancerous cells can assists the medical professionals in correct diagnosis and planning of the proper treatment modality [3]. In this known machine learning algorithms available in MatlabR2016 are extensively analyzed for their classification potential of Pap smear cases. To Train and Test the algorithms a huge database is created containing 8091 cervical cell images pertaining to 200 clinical cases collected from three medical institutes of northern India. The raw cases of cervical cancer in form of Pap smear slides were headed digital microscope. After profiling the cells were vigilantly assigned classes by multiple cytotechnicians and histopathologists [4]. Cervical cases have seven classes of diagnosis [4].Quadratic SVM performed Machine learning, Neural networks, Cervical cancer is the second most common form of cancers affecting the female population after breast cancer. This malignant cancer affects the cervix uteri or cervical area of the female reproductive organs by uncontrolled cell division and growth. Human papillomavirus (HPV) an icosahedral DNA virus, non-enveloped with a diameter of 52 main agent for the pathogenesis of cervical cancer [5]. More than 120 types of HPV types are acknowledged today [6], among them only 15 are classi risk types [7], 3 as probable-high risk. The cells over the surface of the cervix affected by HPV shows precancerous developments called CIN which passes through various stages CIN1, CIN2, CIN3 and finally invasive cervical carcinoma (ICC). This progression takes over a period of two to three decades [8]. The most important part for any therapy is therefore to detect and wipe out local CIN3 lesions before it progresses to ICC [9]. According to WHO system the growth of CIN can be divided into three grades 1,2 and 3 and at least two half of the CIN2 and one third of CIN3 has the chance to regress back to normal [9]. A new system called Bethesda system categorizes cervical epithelial precursor lesions into two classes: the Low Squamous Intraepithelial Lesion (LSIL) and High grade Squamous Intraepithelial Lesion (HSIL). The LSIL corresponds to CIN1, while the HSIL includes CIN2 and CIN3 [10].

Machine Learning.
Machine Learning a branch of Artificial intelligence produces computer programs that learns from data samples without being explicitly programmed thus it relates learning from data to common concept of inference [11][12] [13] . In biomedical field machine enveloped with a diameter of 52-55nm is the main agent for the pathogenesis of cervical cancer [5].
han 120 types of HPV types are acknowledged , among them only 15 are classified as highhigh-risk, and 12 as lowrisk. The cells over the surface of the cervix affected by HPV shows precancerous developments called CIN which passes through various stages CIN1, CIN2, CIN3 and finally invasive cervical carcinoma (ICC). This progression takes over a period of two to ee decades [8]. The most important part for any therapy is therefore to detect and wipe out local CIN3 lesions before it progresses to ICC [9]. According to WHO system the growth of CIN can be divided into three grades 1,2 and 3 and at least two-thirds of CIN1, half of the CIN2 and one third of CIN3 has the chance to regress back to normal [9]. A new system called Bethesda system categorizes cervical epithelial precursor lesions into two classes: the Low-grade Squamous Intraepithelial Lesion (LSIL) and Highgrade Squamous Intraepithelial Lesion (HSIL). The LSIL corresponds to CIN1, while the HSIL includes Machine Learning a branch of Artificial intelligence produces computer programs that learns from data thout being explicitly programmed thus it relates learning from data to common concept of inference [11][12] [13] . In biomedical field machine International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 @ IJTSRD | Available Online @ www.ijtsrd.com | Volume -2 | Issue -1 | Nov-Dec 2017 Page: 620 learning with its different techniques and algorithms has proven its ability of reaching to an acceptable generalization by searching through an n-dimensional space of complex bio-medical datasets [14]. Machine learning algorithms are trained by two methods 1) Supervised learning and 2) unsupervised learning. A Machine learning algorithm provide a data sample with less dimension produces better results as compared to data samples with large dimensionality [15]. Reducing dimension/Feature selection is done though methods called embedded, filter and wrapper approaches [15].The models in machine learning are usually trained to classify the data items into one of several predefined classes. A good classification model is rated on the basis of classification and generalization errors. Machine learning has a large no of algorithms able to learn the intricate relationships existing in complex multidimensional datasets e.g. ANN, KNN, SVM, Decision tress etc.

Methods.
Decision treeare tree structured classifiers where an attribute is tested at internal nodes, each outgoing branch from an internal node represents one of the possible values of the test. Each test instance after tracing a particular path from the root node through the internal nodes based upon the test results, will halt at aleaf node holdingclass label for the test example. Decision trees are trained by ID3,C4.5 techniques and CART. SVM classifies instances of different classes by constructing set of hyperplanes in a high dimensional space. The hyperplane that largely separates (maximum margin hyperplane) classes is chosen for constructing classifier. KNN is a nonparametric and instance based method for classification. KNN assigns an instance to a class most common among its K nearest neighbors. Ensemble system of classification engages number of independent trained classifiers to propose the class label for a testing instance. Ensemble system produces much greater classification accuracy than independent classifiers. Artificial neural network (ANN) acts as a gold standard method in number of classification tasks and non-linear analysis of complex data [16] [17] [18]. ANN architecture consists of number of independent nodes/processing units arranged in input, hidden and output layers, connected by weighted connections called weights. The no of nodes in input layer corresponds to number of clinical variables in the data sample, nodes in hidden layer receives the weighted signals from the input nodes and calculates its output by passing the sum of weighted input values through an activation function. The output nodes then produce the output of the network by passing the sum of weighted signals received from the hidden nodes through activation function.
3 Literature Review. [19] designed an automated cervical cell segmentation and classification system. The system using fuzzy cmeans clustering technique (FCM) segmented each cervical cell into cytoplasm and nucleus regions. Five machine learning algorithms KNN, ANN, SVM, LDA and Bayesian classifier were implemented to classify the segmented cells in to their respective class of diagnosis. [20] Accessed the capability of artificial neural network to clearly distinguish malignant from benign breast cancer cases and also to predict the probability of breast cancer for individual patients. A large dataset consisting of 62,129 mammography findings are used to train a three layer feed forward network. [3] Proposed an innovative method ensemble of ensembles technique called hybrid ensemble method to increase the classification efficiency of AI based automated screening models. [21] Surveyed the applicability of recent machine learning techniques in cancer prognosis and prediction. A variety of machine learning techniques including ANN, SVM, Decision trees, Bayesian Networks have been widely used in the development of automated predictive models. of the Pap smear images to their respective classes of diagnosis [3]. In this research we extensively tested the screening potential of 23 machine learning algorithms over a database of 8091 Pap smear images,against four performance metrics classification accuracy, Precision, Sensitivity and Fmeasure.The classification results of all classifiers 10 fold cross validation are tabulated in table 4.
Quadratic SVM with a classification accuracy of 78.25% and F-value 0.69490 was the best classifier.The digital database developed along with potential machine learning algorithms especially quadratic SVM can play pivotal role in designing automated cervical cancer detection tool for efficient and timely detection of cancer.

S.no
Machine Learning Algorithm Classification Accuracy Precision Sensitivity F-Value