Most Important Organs In Our Body

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Introduction

The heart is one of the most important organs in our body. It is a four chambered mechanical pump made of complex muscles. These four chambers are separated by valves and divided into two halves. Each contains one chamber called an atrium and one called a ventricle. The artia collect blood and the ventricles contract to push blood out of the heart. The right half of the heart pumps oxygen poor blood to the lungs where blood cells can contain more oxygen. Newly oxygenated blood travels from the lungs into the left atrium and the left ventricle. This left ventricle pumps the newly oxygen rich blood to the organs and tissues of the body. This oxygen provides our body with energy and essential to keep our body healthy [1].

India is undergoing rapid epidemiological transition as a consequence of economic and social change and heart disease is becoming an increasingly important cause of death. According to WHO world health statistics report, mortality due to cardiac causes has overtaken, mortality due to all cancers put together. In India alone, we have about 4,200 sudden cardiac deaths (heart attacks) per one lakh deaths annually. It is estimated that by the year 2030 India will rank among the highest rank of heart disease[2].At present out of 10 deaths in India eight are caused by non communicable disease such as CVD in urban India. In rural India 6 out of every 10 deaths is NCD [3].Studies to determine the precise cause of death in Andhra Pradesh have revealed that CVD cause about 30% of Deaths in rural areas. Several studies shows a high prevalence of diabetes and other risk factors for heart disease in A.P study in rural Andhra Pradesh showed the CVD was the leading cause of mortality accounting for 32%of all deaths, a rate as high as Canada (35%) and U.S.A. Hence there is an urgent need for development and implementation of suitable prevention approaches to control this epidemic.

Health care organizations produce and stores huge amounts of patient’s data. With increasing sizes of amount of data stored in medical data bases, there is a need for automatization of processes for extraction of medical data that helps to get interesting information and elimination of manual tasks. Medical data are processed and analyzed using different data mining techniques to extract useful information. This extracted information is valuable for diagnosis, decision making, risk analysis and predictions.

Data mining is defined as the non trivial process of identifying valid, novel and potentially useful patterns in data. Data mining is a crucial step in KDD.In recent years data mining has found a significant role in health care. Data mining can enable health care organizations to predict trends in patient’s conditions and their behaviors. Medical data mining provides countless possibilities to extract hidden patterns from huge amounts of medical data.

Predicting the outcome of disease is the challenging task in medical data mining. Data mining have shown a good result in prediction of heart disease. It is widely applied for prediction or classification of heart disease. The objective of this thesis is to study prediction of risk factor for heart disease in A.P using data mining techniques. This work focuses on applying different data mining techniques specially association rule mining, classification and clustering in developing a model to predict the heart disease of a patient for Andhra Pradesh population.

Literature survey

2.1 overview of the data mining

Data mining is a multidisciplinary field used to extract knowledgeable information from huge repositories. Data mining merge ideas from areas like machine learning, statistics, pattern recognition, artificial intelligence and data visualization. It attracted great deal of attention in the information technology driven society due to wide availability of huge amounts data and the need to transform this data into useful information. The information gained can be used in wide range of applications .these include market analysis, telecommunication and health data mining [4]

Data mining is an integral part of broader process known as knowledge discovery in data bases (KDD), which converts raw data into useful information. According to Fayyad et.al [5] KDD is a non trivial process of finding novel, valid and potentially useful and understandable patterns in data. The KDD process model consists of 9 transformation steps that are to be executed iteratively. Various steps involved in KDD are

Understanding the application domain

This is the first step where domain experts need to understand and define the goals of the end user.

Creating a target data set

After defining the goals, the data which is used for the KDD process should be determined. This process is very important as data mining learns and discovers information from the available data.

Data cleaning and preprocessing

In real world data is noisy, irrelevant and consists of redundant attributes. Data reliability is enhanced in this stage. Various preprocessing techniques like data cleaning, data integration are applied during this stage.

Data transformation

In data transformation stage data are transformed into appropriate forms for data mining. This is the crucial step for success of KDD.Data transformation methods include dimensionality reduction and attribute transformation. Data transformation performs summary, aggregation operations for instance.

Kinds of knowledge to be mined

In this stage we decide which data mining function to use. The kinds of knowledge to be mined include predictive or descriptive tasks, characterization, association rule mining, classification and prediction, clustering, outlier and evolutionary analysis.

Choosing the appropriate data mining algorithm

This stage includes selecting the appropriate data mining algorithm to be used for data mining task.Meta learning approach attempts to understand the conditions to select the most appropriate data mining algorithm.

Implementation of data mining algorithm

In this stage we need to employ the data mining algorithm several times until we obtain a satisfactory result.

Evaluation and interpretation of mined patterns

In this stage we interpret and evaluate the mined results w.r.t the goals which are defined in the first step. We consider the preprocessing steps w.r.t their effect on the data mining result. Discovered knowledge is also documented in this step for future step.

Using the discovered knowledge

In this stage the knowledge is incorporated into another system for further action. The successes of this stage determine the effectiveness of KDD.

Interestingness measures

Interestingness measures provide framework for many data mining algorithms. These measures play a vital role in many applications where full automation is desired [6].

A data mining system has the potential to generate hundreds or even thousands of patterns, but all the generated patterns are not interesting. Typically only a small fraction of patterns are actually would be interested to the user.a discovered patterns is interested to user if it is novel, useful and easily understandable [7].

Interesting measures are classified into two types.

Objective interesting measures:

Objective measures are based on the structure of discovered patterns and statistics underlying them. These measures may not highlight important pattern generated by data mining system.

Subjective interesting measure:

Subjective measures usually determine whether a pattern is actionable and or unexpected.

According to Piatetsky and Shapiro there are three measures to evaluate the pattern.

RI=0 if |A ∩ B|=|AUB|/N

Where RI is monotonically increases with |A ∩ B| when other parameters are fixed.

RI is monotonically decreases with |A| or |B| when other parameters are fixed.

RI is increases monotonically with |A| with fixed factor cf>cf[8].

Description of measures

Support :

Support is an objective measure. Support for an association rule x=>Y is the percentage of transactions in the data base that contain XUY.It is defined as

SUPPORT(X=>Y) =Probability of XUY.

The strength of association rule can be measured using support measure.

Confidence:

Confidence measure is used to generate rules from the frequent item sets. It is used in conjunction with support.

Confidence(X=>Y) = P(X∩Y)/P(X)

Confidence measures the reliability of the inference made by the rule.

Lift :

This is also called as interest. Lift filters the rules generated by support and confidence.

Lift (X=>Y) = P (X∩Y)/ P(X). P(Y)

If lift (A, B) =1 then A and B are independent.

If lift (A, B)>1 then A and B are positively correlated.

If lift (A, B<1 then A and B are negatively correlated.

Correlation coefficient:

Correlation coefficient measure is used to mine both positive and negative association rules. It is defined as

Corr(X, Y) =Support (XUY)/Support(X).Support(Y)

If Corr(X, Y)>1 X, Y are positively correlated

Corr(X, Y) =1 X, Y are independent

Corr(X, Y) = X, Y are negatively correlated.

Laplace :

This measures used for classification .this is a confidence estimator that takes support measure into account.

Laplace(X=>Y) =support (XUY) +1/support(X) +2

Cosine :

This measure is introduced by Tan et.al. and can be viewed as harmonized lift measure. It is used to measure the distance between antecedent and consequent. This is defined as

Cosine(X=>Y) =support (XUY)/√support(x).support(y)

Cosine values ranges between [0, 1]

7) Jaccard coefficient:

This measure is used to find the distance between antecedent and consequent in association rule. It measures the fraction of cases covered by both with respect to the fraction of cases covered by one of them. Its values range between 0 to 1. Value 1 indicates that both antecedent and consequent cover the same cases. It is defined as

Jacc(X->Y) = Supp (XUY) / Supp(X) + Supp(Y) – Supp (XUY)

2.1 Data mining techniques

Data mining consists of various methods. Each method serve different purposes and having their own advantages and disadvantages. Data mining techniques are mainly used to specify the kind of patterns to be found in data mining task. Data mining tasks can be divided into descriptive and predictive. Descriptive task characterize the general properties of the data in the data ware house, where as predictive task perform inference on the current data to make predictions. Descriptive data mining tasks are exploratory in nature and require post processing techniques to validate the results. Predictive data mining tasks include classification and prediction, regression and time series data mining. Descriptive tasks include association rule mining, clustering, and sequence analysis .the main descriptive and predictive data Ming tasks can be classified as

2.1.1 Association rule mining

Association rule mining is one of the most and well researched techniques in data mining and knowledge discovery was introduced by Agarwal et al.[10].association rule mining was primarily proposed for masket basket analysis to study consumer purchasing habits. The problem of association rule mining is decomposed into two sub problems

Finding frequent item sets

Generating rules from frequent item sets

The performance of association rule mining will be determined by finding frequent item sets only. Association rule mining is an unsupervised learning approach to discover the interesting association or correlation among the frequent item sets. A item is said to be frequent if it appears more no. of times in a transactional data base. Association rules must satisfy two interestingness measures namely support and confidence. A rule which satisfy minimum support and minimum confidence is said be strong association rule. An association rule is an implication of the form A=>B,where A,B are two frequent item sets and A∩B=.

The main association rule mining algorithm Apriori was proposed by srikant et.al [11].it works on the principle of Apriori property i.e subset of any frequent item set is also frequent.Apriori is a level wise algorithm which uses down ward closure property.Apriori algorithm uses support based pruning which systematically control the growth of candidate item sets.

The apriori algorithm generates candidate itemsets by performining the following 2 operations

1) Candidate generation:

To find Lk item sets by joining Lk-1 with itself .this set of candidates is denoted by CK

Candidate pruning:

This process eliminates candidates item set which will not satisfy minimum support.

Apriori algorithm suffers from two major bottlenecks

Repeated no. of data base scans

It may need to generate to huge number of candidate sets.

There are various methods to improve the efficiency of apriori algorithm

1) Hash based technique

2) Transaction reduction

3) Partitioning

4) Sampling

5) Dynamic item sets counting

Association rule mining have played a major role in data mining and rules are useful in real world applications. Association rule mining becomes challenging when the data set is very large as the no. of association rules grows exponentially with the no. of frequent items. This problem will be tackled with the efficient algorithms which efficiently prune search space.

2.1.2 Classification

The problem of classification is an important research topic in pattern recognition, machine learning, and in data mining. Classification is a decision making process on the basis of set of instances where a new instance is assigned to one of possible group of instances or classes. Classification is also called supervised learning that induces knowledge from training data. Some of the major classification algorithms are discussed in the following sub sections.

2.1.2.1 Decision tree

One of the most popular and promising approach in data mining is the use of decision trees. Decision trees are simple and successful technique for predicting the class label. Apart from data mining decision trees are used in other areas such as machine learning,pattern recognition, text mining, and information retrieval systems [12].

A decision tree offers many benefits. Some are listed as follows.

Versality for wide variety of data mining techniques such as classification, regression, clustering and feature selection.

Easy to follow and self explanatory tools

Flexibility in handling data of types nominal,numeric,textual

Useful for large data sets.

High predictive performance for small computational cost

Decision tree is flow charts like tree structure, where each internal node denotes a test on an attribute and each branch represents an outcome of the test and leaf node represent class.

To know class label of an unknown sample attribute values of the samples are tested against the decision tree. In order to classify unknown sample, the attribute values of the samples are tested against the decision tree.A path is traced from the root to a lead node (class)

Basic algorithm for decision tree is described as follows.

Step 1: create a node N

Step 2: if samples are all of the same class C then return node N as a leaf node labeled with class C

Step 3: if attribute set is empty then return node N as a leaf node labeled with the most common class in samples.

Step 4: select test attribute .this is attribute among the attribute set with the highest information gain.

Step 5: label node N with test attribute

Step 6: for each known value ai of test attribute

Step 7: Grow a branch from node N for the condition test attribute = ai

Step 8: let Si be the set of samples in samples for which test attribute = ai

Step 9: if Si is empty then attach a leaf labeled with the most common class in samples.

Step 10: else attach the node returned by generate decision tree.

After constructing the decision tree, a tree pruning step is performed to reduce the size of the decision tree. Tree pruning helps by trimming the branches of the initial tree to improve the generalization capability of the decision trees.

Some weakness of decision trees are

Some decision trees deal only with binary valued target classes.

The process of growing decision tree is computationally expensive.

Bayesian Classifier

Bayesian classifiers are statistical classifiers based on Bayes’ theorem. They can predict class membership probabilities, such as the probability that whether a given sample belongs to a particular class. It makes use of all the attributes contained in the data, and analyses them individually as though they are equally important and independent of each other. , Bayesian classifiers have the minimum error rate in comparison to all other classifiers. However, in practice this is not always the case, owing to inaccuracies in the assumptions made for its use, such as class conditional independence, and the lack of available probability data. Bayesian

classifier, works as follows: Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an n-dimensional attribute vector, X = (x1, x2, . . . , xn), depicting n measurements made on the tuple from n attributes, respectively, A1, A2, . . . , An.

Suppose that there are m classes, C1, C2, . . . , Cm. Given a tuple, X, the classifier will predict

that X belongs to the class having the highest posterior probability, conditioned on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if

Thus we maximize P(Ci/X).The class Ci for which P(Ci/X)is maximized is called maximized is called the maximum posteriori hypothesis. By Bayes’ theorem

As P(X) is constant for all classes, only P(X/Ci)P(Ci) need to be maximized. If the class probabilities are not known, then it is commonly assumed that the classes are equally likely, that Is P(C1)= P(C2)=---P(C3) and we would therefore maximize P(X/Ci).otherwise we maximize P(X/Ci)P(Ci). Note that the class prior probabilities may be estimated by P(Ci)=P|Ci,D|/|D| is the number of training tuples of class Ci in D.

Given datasets with many attributes, it would be extremely computationally expensive to compute P(Ci/X). In order to reduce computation in evaluating P(Ci/X).The naïve assumption of of class conditional independence is made. This presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple (i.e., that there are no dependence relationships among the attributes). Thus,

In order to predict the class label of X,P(XjCi)P(Ci) is evaluated for each class Ci. The classifier

Predicts that the class label of tuple X is the class Ci if and only if

In other words, the predicted class label is the class Ci for which P(X/Ci)P(Ci) is the maximum.

2.1.2.3 Classification by neural networks

The field of neural networks was inspired by attempts to simulate biological neural systems. The evolution of neural networks originates from Mc culloch and Pitts in 1943.

Neural network is a set of connected input, output units where each connection has a weight associated with it. In learning phase the neural network learns by adjusting the weights so as to able to predict the class label of a sample.

Advantages of neural networks include

High tolerance to noisy data

Ability to classify patterns on which they have not been trained.

Extraction of rules from trained neural networks

Disadvantages of neural networks are

Neural networks involve long training samples

Poor interpretability

Multilayer feed forward neural network:

An artificial neural network has a more complex structure than perceptron model. The inputs are fed to input layer. The weighted outputs of these units are fed to a layer known as hidden layer. The hidden layers weighted outputs can be input to another hidden layer and so on. The weighted output of the last hidden layer are input to output layer which emits the network’s prediction for given sample.

Back propagation is an neural network learning algorithm. It learns by iteratively processing a set of traning samples, comparing the netwok’s prediction for each sample with the actual known class label. The weights are modified for each training sample to minimize error between the network’s prediction and the actual class.

Algorithm for back propagation:

Step 1: initialize weights and biases in neural network

Step 2: while terminating condition is not satisfied

Step 3: for each training sample S

Step 4: for each hidden or output layer unit h

Step 5:Ih=∑I WihOi+h

Step 6: Oh=1/1+e-ih

Step 7: for each unit h in the output layer

Err h=Oh(1-Oh) (Th-Oh)

Step 8: for each unit h in the hidden layer

Err h=Oj (1-Oj)∑k Err k Whk

Step 9: for each weight Wij in network

Δwih=(l)Err h Oi

Wih=wih+ Δwih

Step 10: for each bias ÆŸh in neural network

Δ Ɵh= (l) Err h

Ɵh= Ɵh+ Δ Ɵh

Applications of neural network:

Now days neural networks are used in many applications. Few of them are

Investment analysis:

Neural networks are used to predict the movement of stock market, currencies etc from previous data.

Monitoring :

Neural networks are used to monitor the state of aircraft engines.

Marketing :

Neural networks are also used to improve marketing mailshots.these can be used to know about the clients how they respond to mail shots [13].

Clustering

Clustering us the process of grouping a set of objects into classes of similar object. Objects in

the same clusters are similar and are dissimilar with the objects in the other cluster. Clustering is also known as unsupervised learning. Due to huge amounts of data collected in data ware houses clustering has become an active research topic in data mining. Clustering is often performed as a preliminary step in a data mining process, with the resulting clusters being used as further inputs into a different technique downstream, such as neural networks. Due to the enormous size of many present day databases, it is often helpful to apply clustering analysis first, to reduce the search space for the algorithms.

Requirements of clustering in data mining are

Scalability

Ability to deal with different types of attributes

Discovery of clusters with arbitrary shape

Minimal requirements for domain knowledge to determine i/p parameters

Ability to deal with noisy data

High dimensionality

Constraint based clustering

Interpretability and usability.

Some of the major clustering methods include partitioning methods and hierarchical metods.partitioning methods construct K partitions of data, given a data base of n objects, where each partition represents a cluster and K<N.approximately there are KN/K! Ways of partitioning a set of N data points into k subsets.

There are two main popular partitioning algorithms are

K-Means algorithm:

The K-means algorithm is one of the most commonly used partitioning algorithm. The "K" in its name refers to the fact that the algorithm looks for a fixed number of clusters which are defined in terms of proximity of data points to each other. In K-Means algorithm each cluster is represented by mean value of the objects in the cluster. Even though, K-means algorithm is the simplest and commonly used algorithm it is very sensitive to noise and outlier data points, because a small number of such data can substantially influence the mean value.

K-Medoids algorithm: in K-Mediods algorithm each cluster is represented by one of the objects of the cluster near the centre.

Hierarchical methods create a hierarchical decomposition of the data base. In hierarchical clustering, a tree like cluster structure is created through recursive partitioning or combining of existing clusters. There are two categories of Hierarchical clustering, agglomerative and divisive methods. Agglomerative Hierarchical clustering is also called as bottom up strategy, starts with each object in it’s own cluster and merges these automatic clusters into larger clusters, until all of the objects are in a single cluster or till certain termination condition are satisfied. Divisive Hierarchical clustering also known as top down strategy. This clustering does the reverse of Agglomerative Hierarchical clustering by starting with all objects in one cluster. It partitions the cluster into smaller and smaller units until each object form a cluster on it’s on or until a termination condition satisfies.

Clustering has variety of applications including pattern recognition, spatial data analysis, biological studies; market and customer segmentation.clutering can be used as a preprocessing tool in data mining.

2.1.5 Associative classification

Associative classification is a recent and rewarding technique that integrates association rule mining and classification. It is a special class of association rule mining in which class attributes is considered in the rules consequent. For example in a rule A=>C, C must be a class. Associative classification achieves higher accuracy than traditional classification methods and many of the rules found by Associative classification cannot be discovered by traditional classification techniques. Associative classification is especially fit to application where the maximum accuracy is desired. Associative classification was first introduced by Liu et.al [15].

Associative classification generally involves two strategies

Generate class association rules from a training data set.

Classify the test data set into predefined class labels.

The rule generation phase in Associative classification is a hard step that requires a large amount of computation. A rich rule set is constructed after applying suitable rule pruning and rule ranking strategies. This rule set which is generated from the training data set is used to build a model which is used to predict test cases present in the test data set.

The main task of associative classification is to discover a subset of rules which satisfy support and higher confidence subsets are used to build the classifier which is used to predict the class label of unknown sample. There are two methods of associative classification 1) Eager associative classification 2) lazy associative classification [16]. Eager associative classification construct the generalized model to predict the class label whereas lazy associative classification delays the processing of data until a new samples needs to be classified and it does not build the model to classify an insatance.lazy approach improves accuracy but leads to high computation cost. Adopting information centric attribute approach will reduce the computation cost [17].

2.2 Applications of Data mining techniques to health care data

Information technologies have found wide applications in the health care .Data mining provides a user oriented approach to extract knowledgeable data. The discovered knowledge can be used by health care industry to improve quality of services .this knowledge can be used to improve public health .the health care of the system users, and to reduce the no. of adverse drug effect, to save time, money and human lives. Following are the some of the important areas where data mining techniques can be used to health care industry [18].

Public health informatics

Health insurance

E-governance in health care

Anticipating patient’s future behavior given their history

Forecasting treatment cost and demand of resources

Predictions of risk score for a particular disease.

Health care information systems contains huge amounts of data that include information on patients, data from diagnose test which are too complex and difficult to be processed by traditional methods. With the use of different data mining techniques useful information found in this data that will later be used for future research and report evaluation. Following are the some of the data mining techniques which are successfully used in health care.

Association rule mining

Classification

Clustering

Associative classification

Text mining

Genetic algorithms

Artificial neural networks

K-Nearest neighbor method

Heart disease

Heart disease is the term that includes all types of disease that affect various components of the heart. All heart diseases belongs to the category of CVD.Common cause of all heart diseases is the inadequate pumping of oxygen and blood from the heart to the rest of the body and vice versa. Heart disease is a general name for a wide variety of diseases, disorders and conditions that affect the heart and sometimes the blood vessels as well. Heart disease is the number one killer of women and men in developing countries. Symptoms of heart disease vary depending on the specific type of heart disease. A classic symptom of heart disease is chest pain. Although heart disease will occur in different forms, there is a common set of risk factors that influence whether someone will be at risk for heart disease or not. The risk factors include age, gender, hypertension, diabetes, high cholesterol, obesity, raised blood glucose, and a sedentary lifestyle.

Different types of heart disease are

Ischemic heart disease:

Ischemic heart disease is also known as coronary heart disease is the most common type of disease in developing and developed countries around the world. Coronary heart disease is the term that refers to problems with the circulation of blood to the heart muscle.a partial blockage of one of more of the coronary arteries will result in lack of supply of oxygen thus causes chest pain and shortness of breath. A complete blockage of an artery causes heart attack. High cholesterol is the most prominent factors that can increase the building up of fatty deposits. Another factor which causes coronary heart disease is the use of cigarette and tobacco.

Congestive heart failure :

Congestive heart failure is a condition in which heart does not pump adequate blood to the other organs in the body. Congestive heart failure often results from heart disease as well as constricted arteries. Symptoms of Congestive heart failure are edema, swelling, shortness of breath and kidney problems.

Pulmonary heart disease:

Pulmonary heart disease is a disease which comes from lung and heart complications and blood which flows into the lungs is slowed or even blocked and increase pressure on the lungs. Symptoms for Pulmonary heart disease include chest pain, shortness of breath, syncope and dyspnea.

Rheumatic heart disease:

Rheumatic heart disease frequently derives from throat infections. Throat infections initially begin as minor throat infection and worsen if they were not treated in time.

Congenital heart disease:

Congenital heart disease or hereditary heart disease passed down through the family. This particular type of disease is unpreventable and inevitable. Congenital heart disease refers to a problem with the heart's structure and function due to abnormal heart development before birth. Congenital heart disease is the most common type of birth defect. It is responsible for more deaths in the first year of life than any other birth defects.

Chronic disease risk factors

According to WHO report 2010 on chronic diseases, the major risk factors for chronic disease are

Use of tobacco

The harmful use of alcohol

Hypertension

Raised cholesterol levels

Obesity

Unhealthy diet

Raised blood glucose

Diabetes

Risk factors for individuals are classified as

Back ground risk factors:

These risk factors include age, gender, level of education and genetic composition

Behavioral risk factors:

These include cigarette smoking, use of tobacco, physical inactivity and taking unhealthy diet.

Intermediate risk factors:

These factors include blood lipids, diabetes, high BP and obesity.

For communities, the main risk factors include

Social and economic conditions

Environment

Culture

Urbanization

Heart disease in INDIA:

INDIA is undergoing rapid epidemiological transition as a consequence of economic and social change and cardio vascular disease is becoming and increasingly important cause of death. India with a population of more than 1 billion accounted for 60% world’s heart disease cases in 2010.india’s disease pattern has undergone a major shift over the past decade. As per WHO report, at present out of 10 deaths in INDIA, eight are caused by non communicable disease, such as cardio vascular diseases and diabetes in urban india.In rural India 6 out of every 10 deaths is caused by NCD’S [20].According to the coronary artery disease among Indians research foundations, 62 million people will suffer from heart disease by 2015[19].In India almost 25% of victims of heart disease falling under the age of 40 years. Around 60000 to 90000 children attract heart disease in India and only 15000 to 20000 are cured [21].In India alone, we have about 4280 sudden cardiac deaths per lakh deaths annually. By the year 2030, India will rank among the highest risk for heart disease [22].CHD prevalence appears to be worsening in India. Prevalence of CHD will rise in India compared to china and established market economies from the year 1990-2020[23].Leeder et.al[24] estimated total years of life lost due to CVD among Indians men and women aged 35-64 to be higher than china and brazil as shown in table.

Heart disease in Andhra Pradesh:

Studies to determine the precise cause of death in Andhra Pradesh have revealed that CVD cause about 30% in rural areas [25].several studies show a high prevalence for heart disease in Andhra Pradesh .sudden cardiac death contributed to 10% of overall mortality in A.P .CVD was the leading cause of mortality accounting for 32% of all deaths, a rate as high as Canada (35%) and U.S.A.

Andhra Pradesh is in risk of more deaths due to CHD.Hence a decision support system should be proposed to predict the risk score of a patient, which will help in taking precautionary steps like balanced diet and medication which will in turn increase life time of a patient.

Genetic Algorithm

Genetic algorithms are computing methodologies constructed in analogy with the process evolution [26]. Genetic algorithms represent powerful and general purpose search method based on natural selection and genetics .They simulate natural process based on principles of lamark and Darwin .A simple Genetic algorithm is a stochastic method which performs searching in global search spaces, depending on some probability values. Genetic algorithms are typically used for problems that cannot be solved efficiently with traditional techniques.

Basically there are three genetic operators are used in genetic algorithm. The driving force behind genetic algorithm is unique cooperation between three genetic operators. The operators are

Selection:

Selection is an operator applied to the current population in a manner similar to the one of the natural selection found in biological systems. The fittest individuals are promoted to the next population and poorer individuals are discarded.

Crossover :

It is the exchange of genetic material denoting rules, structural components and features of machine learning, search or optimization problem. Various types of cross over operators are

1) Single point

2) Two point

3) Uniform

4) Half uniform

5) Reduced surrogate crossover

6) Shuffle crossover

7) segmented crossover [27].

3) Mutation:

Mutation alters the new solutions so as to add stochasticity in the search of better solution. This operator is applied to an individual single bit of an individual binary string can be flipped w.r.t a predefined probability.

Fitness function:

There must be a fitness function to evaluate individual’s fitness. The fitness function is a very important component of the selection process since offspring for the next generation are determined by the fitness value of the present population.

Pseudo code for genetic algorithm:

Begin

Step 1: t=0

Step 2: initialize population P(t)

Step 3: compute fitness P (t)

Step 4: t=t+1

Step 5: if termination criterion achieved go to step 10

Step 6: select P (t) from P (t-1)

Step 7: crossover P (t)

Step 8: Mutate P (t)

Step 9: go to step 3

Step 10: output best and stop

End

The main motivation for using genetic algorithm in the discovery of high level prediction rules is that the discovered rules are highly comprehensible, having high predictive accuracy.

Related work

Numerous works in the literature related with heart disease using different data mining techniques have motivated our work. Some of the works are discussed below.

Study on various data mining techniques in health care was examined by Harleen Kaur et.al[18].they studied the potential use of classification based various data mining techniques such as classification by decision tree, rule based classifier and artificial neural networks. To health care data. Intelligent heart disease prediction system using decision tree, naive bayes and neural networks was proposed by sellappan palaniaapn et.al [28].they developed a prototype model using .net, which is user friendly and web based. The model extracts hidden knowledge from heart disease data base. Analysis of medical data using formal concept analysis was proposed by Anamika gupta et.al [29].they analyzed medical diagnosis data using classification and formal concept analysis. Their model helps in finding redundancies among various tests used in the diagnosis of heart disease .Heart attack prediction system using artificial neural network and data mining was proposed by Shantakumar patil et.al [30].they employed the multilayer perceptron with back propagation as the training algorithm. Initially the heart disease data ware house is preprocessed. After preprocessing the data ware house is clustered using K-means .frequent patterns are mined using MAFIA algorithm. Heart disease prediction using data mining and soft computing methods was proposed by Akira Hara et.al[31].they reported the overview of the CHD-DB and compared advantages and disadvantages with various methods. Heart disease prediction with feature subset selection and genetic algorithm was proposed in [32].in their work genetic algorithm is used to determine the attributes which contribute more towards the diagnosis of heart disease. Thirteen attributes are reduced to six attributes using genetic algorithm. The three classifiers decision tree, classification via clustering and naïve bayes were used for diagnosis of heart disease. Prediction of risk score for heart disease using machine intelligence was proposed by K.Rajeswari et al [33].they proposed a clinical decision support system for heart disease risk score using machine intelligence .they designed the system for Indian population.Mai shouman et.al[34] investigates by applying K-NN in diagnosing heart disease. The methodology integrates voting with KNN in diagnosing heart disease patients. This thesis investigates applying various data mining techniques for prediction of risk score for heart disease using data mining for Andhra Pradesh population.

CHAPTER 3

This chapter is devoted for the presentation of our proposed algorithms knowledge discovery from mining association rules for heart disease prediction and cluster based association rule mining for heart attack prediction.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now