Decision Tree Models for Medical Diagnosis

Data mining techniques are rapidly developed for many applications. In recent year, Data mining in healthcare is an emerging field research and development of intelligent medical diagnosis system. Classification is the major research topic in data mining. Decision trees are popular methods for classification. In this paper many decision tree classifiers are used for diagnosis of medical datasets. AD Tree, J48, NB Tree, Random Tree and Random Forest algorithms are used for analysis of medical dataset. Heart disease dataset, Diabetes dataset and Hepatitis disorder dataset are used to test the decision tree models.


INTRODUCTION
At present, Data mining has had a significant impact on the information industry, due to the wide availability of huge datasets, which are stored in databases of various types. Data mining is presence place into apply and considered for databases, along with relational databases, object relational databases and object oriented databases, data warehouses, transactional databases, unstructured and partially structured repositories, spatial databases, multimedia databases, time-series databases and textual databases [6].
Different methods of data mining use different purpose of uses. The methods contribute some of its own advantages and disadvantages. In data mining, classification plays a crucial role in order to analyses the supervised information. Classification is a supervised learning method and its objectives are predefined [1]. The role of classification is important in real world applications including medical field. Decision trees play a vital role in the field of medical diagnosis to diagnose the problem of a patient. In this paper various decision tree classifiers are used to analyses the medical datasets.
The rest of the paper is organized as follows. Section 2 provides the related work and section 3 presents the overview of classification algorithms. The experimental results are discussed in section 4. Finally, conclusion of this study was provided in section 5.

RELATED WORKS
Many papers are proposed the performance evaluation of decision tree classifiers. G. Sujatha [7] presented the performance of decision tree induction algorithms on tumor medical data sets in terms of Accuracy and time complexities are analyzed. In the paper of T.Karthikeyan [8] mainly deals with various classification algorithms namely, Bayes. NaiveBayes, Bayes. BayesNet, Bayes. NaiveBayes Updatable, J48, Random forest, and Multi Layer Perceptron. It analyzes the hepatitis patients from the UC Irvine machine learning repository. T. Swapna [9] proposed the analysis of classification algorithms for Parkinson's disease classification. In this paper a comparative study on different classification methods is carried out to this dataset and the accuracy analysis to come up with the best classification rule. In the research work of [10], training and test diabetic data sets are used to predict the diabetic mellitus using various classification techniques. And compared the data by applying the material to the conventional techniques of Bayesian statistical classification, J48 Decision tree and SVM to form a prediction model. E. Venkatesan [1] proposed the performance analysis of decision Tree algorithms for breast cancer classification. The paper of Anju Jain [11] reviewed the use of machine learning algorithms like decision tree, support vector machine, random forest, evolutionary algorithms and swarm intelligence for accurate medical diagnosis. Anju Jain etal. [11] proposed medical diagnosis system using machine learning techniques. In recent year, various paper are proposed for medical diagnosis using data mining and machine learning methods.

DECISION TREE CLASSIFIERS
Decision tree learning uses a decision tree to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees Yoav Freund and Llew Mason introduced Alternating Decision Tree (ADTree), a machine learning method for classification, which generalizes decision tree and data structure. This tree predicts the nodes in the leaves and roots. The classification is done by traversing through all paths for all decision nodes. The binary classification trees are distinct and the AD Tree is different among that [1].
J48 is an extension of ID3 algorithm. J48 is a tree based learning approach. It is developed by Ross Quinlan which is based on iterative dichtomiser (ID3) algorithm. J48 uses divide-and-conquer algorithm to split a root node into a subset of two partitions till leaf node (target node) occur in tree. Given a set T of total instances the following steps are used to construct the tree structure [2]. NB-Tree is a hybrid algorithm with Decision Tree and Naïve-Bayes. In this algorithm the basic concept of recursive partitioning of the schemes remains the same but here the difference is that the leaf nodes are naïve Bayes categorizers and will not have nodes predicting a single class [3].
Random Tree (RT) is an efficient algorithm for constructing a tree with K random features at each node. Random tree is a tree which drawn at random from a set of possible trees. Random trees can be generated efficiently and the combination of large sets of random trees generally leads to accurate models [4].
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees [5].

EXPERIMENTAL RESULTS
Heart disease dataset, diabetes dataset and liver disease dataset from UCI machining learning repository are used for classification task. 66 % of dataset is used for training and remaining 34 % is used for testing.
Heart disease dataset contains 270 observations and 2 classes: the presence and absence of heart disease. There are 150 patient records without suffer heart disease and 120 records for patient with heart disease. The results of classifiers are showed in

CONCLUSION
In this paper, data mining algorithms are used for medical diagnosis. The focus of this paper is to use the different decision tree models for disease prediction in medical diagnosis and work evaluate the performances in terms of classification accuracy of decision tree classifiers. In the future, a new optimized intelligent system can be designed for medical field by using data mining approach and algorithms.