A Review On Application Of Particle Swarm

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

[email protected] [email protected]

Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi

Proudyogiki Vishwavidhyalaya, Bhopal (M.P.), India

Abstract:

Bioinformatics is an emerging interdisciplinary research area which holds great promise for the advancement of research and development in complex areas such as medicine, biology, agriculture, environment, public health, drug design and so on. It is a blend of computer science and molecular biology. Most of the problems in Bioinformatics are NP hard in nature so researchers have used soft computing and artificial intelligence techniques to solve these problems. Recently the use of Swarm Intelligence techniques for solving Bioinformatics problems has been gaining the attention of researchers because of their ability to generate low cost, approximate, good solutions. Among various algorithms of Swarm Intelligence, Particle Swarm Optimization is used in many applications and has proved to be very effective. This paper reviews and discusses some representative methods to provide inspiring examples to illustrate basic concepts of PSO and how PSO had been applied to solve Bioinformatics problems. These representative examples include RNA Secondary Structure Prediction, Gene Clustering, Phylogenetic Tree Construction, Energy Minimization and Protein Modeling. The aim of this paper is to provide an overall understanding of PSO and its place in Bioinformatics so as to motivate researchers to develop new applications and new concepts.

Keywords : Particle Swarm Optimization, Bioinformatics, RNA Secondary Structure Prediction, Gene Clustering, Phylogenetic Tree Construction, Energy Minimization, Protein Modeling.

1. Introduction:

Now a day, Bioinformatics or "Biological Informatics" is the most emerging field of the 21st century. Bioinformatics [5] is defined as the application of computational and analytical tool to capture and interpret the biological data. Human Genome Project (HGP) creates a demand for experts who not only understand biology but also computer Science and can able to interpret the vast amount of data generated by this type of research. Bioinformatics is often focused on obtaining biologically oriented data such as nucleic acid (DNA/RNA), protein sequences, structures, functions, pathways and interactions. So the aim of Bioinformatics is :

To organize these data into databases so that data can be easily accessed and updated by the researchers.

To devise methods to integrate the related data from different sources.

To develop methods for analysis , management and interpretation of these data.

There are many problems of Bioinformatics. These are:

Clustering of gene expression data

Molecular docking problem

Multiple sequence alignment problem (MSA)

Phylogenetic tree construction

RNA secondary structure prediction

Protein secondary structure prediction

Fragment assembly problem (FAP)

Identifying gene regulatory network

Protein tertiary structure prediction & folding

Characterization of metabolic pathways between different genomes

Most of the above stated problems of Bioinformatics are computationally hard to solve because of combinatorial explosion. Also with the advent in technology, it has been seen that there is a massive growth of biological information gathered by the related scientists. It is difficult to interpret and manage such a huge deluge of data. In addition it is possible that important hidden relationships and correlations exist in the data. So there is a need for effective and efficient computational tools. Several researchers have been using Artificial Intelligence techniques and Swarm Intelligence is one of them..

Swarm Intelligence [5] is an innovative artificial intelligence based distributed paradigm inspired by the behavior of social insects and animals. It is a collection of simple autonomous agents. Each agent can interact with its local environment and other agents. There is no centralized control but the intelligence emergence from the cooperative collective behavior of these swarms. Simplicity, self organization, robustness, flexibility, scalability and distributed nature of SI make them suitable to solve many optimization problems such as electric power systems, communication networks, shop scheduling, transportation , telecommunication etc. Most successful examples of optimization techniques inspired by swarm intelligence are: Particle Swarm Optimization, Ant Colony Optimization, Artificial Bee Colony and Artificial Immune Systems.

Relevance of SI in Bioinformatics

SI algorithms are efficient, adaptive & robust search methods producing near optimal solutions and have a large amount of implicit parallelism and several tasks in Bioinformatics involve optimization of different criteria.

Problems of Bioinformatics seldom need the exact optimum solutions rather they need robust, fast and near optimal solutions which SI are known to produce efficiently.

In Bioinformatics new data and concepts are generated every day and those new data and concepts update or replace the old ones. The SI can be easily adapted to a changing environment without changing the design of the system.

Data is erroneous so more tolerable in executing SI than in executing deterministic algorithms.

The search space is very large and discontinuous at several points.

To provide useful insights for PSO applications in Bioinformatics we structure this paper as follows: Section 2 introduces fundamental aspects of PSO. Section 3 reviews some published work on using PSO in RNA Secondary Structure Prediction. A review of the current literature on PSO based approach in Gene Clustering is provided in section 4. Section 5 discusses some research to illustrate how PSO could be applied to Phylogenetic Tree Construction. Application of PSO in Energy Minimization and Protein Modeling are reviewed in section 6 & section 7 respectively and finally section 8 addresses the conclusion.

2. Particle Swarm Optimization

Particle Swarm Optimization is the population based heuristic global optimization technique and follows the concept of the flocking behavior of birds or fish schooling, proposed by Kennedy and Eberhart in [7]. The fast convergence rate of PSO is become its main strength, simple computation and easy realization, which compares favorably with other global optimization algorithms . In the PSO algorithm, the particles move through N dimensional space, searching for the goal in order to find an optimum solution to the objective function.

Computation in the PSO [24] is based on a collection (called swarm) of fairly primitive elements (called particles). PSO is initialized by a random population of particles. Individual particles work gradually towards the position of their own and neighbor’s best previous performance in the multidimensional search space. They maintain its position, composed of the candidate solution, evaluated fitness and velocity. Additionally it remembers the best position it has achieved so far. Finally it maintains the global best position achieved among all particles in the swarm. The PSO algorithm consists of the following steps:

Evaluate the fitness of each particle

Update individual and global best fitness and positions

Update velocity and position of each particle

Fitness evaluation is operated by delivering the candidate solution to the objective function. Individual and global best’s positions are updated by comparing the newly evaluated against the previous one and replacing the best as necessary. Mathematically, in the every iteration all the particles of the swarm can move in the N dimensional space to find global optimization. The updating equation of velocity and position of each particle is:

Where

V(t) : velocity of the tth particle

X(t) : position of the tth particle

R1, R2: random number in (0,1)

C1, C2 : acceleration constant for the cognitive component and social component in (0,2)

Advantages of the basic particle swarm optimization algorithm:

1. PSO is based on the intelligence. It can be applied into both scientific research and engineering use.

2. There is no centralized control structure. Hence the failure of any particle does not affect the search space making the process robust.

3. The calculation in PSO is very simple. Compared with the other developing calculations, it occupies the bigger optimization ability and it can be completed easily.

4. Only a few parameters needed to be adjusted.

Despite of all the above advantages, disadvantage of the basic PSO algorithm is that it can easily get trapped in the local optimum. A flowchart of PSO algorithm is as shown in a Figure 1.

Figure1 Flowchart of Particle Swarm Optimization

3. PSO in RNA Secondary Structure Prediction

RNA is a versatile bipolymer which fulfills a number of important roles in living cells. It not only carries and recognize genetic information as mRNA( messenger RNA) and trna(transfer RNA) but also known to be act as a catalyst in phosphodiester bond cleavage and ligation. Also it plays an important role in development, in the immune system and in peptide bond catalysis. RNA is also an important target and agent for pharmaceutical industries. Other functions of RNA include controlling gene expression, modulating protein expression, serving in protein localization and determining diseases caused by RNA viruses.

To understand fully its mechanism of action or to target an RNA sequence, the structure of RNA needs to be understood. RNA structure has three levels of organizations. The first level, the primary structure is a linear sequence of nucleotides. Secondary structure is the collection of canonical base pairs in the RNA structure. Finally, a tertiary structure is the three dimensional arrangements of the atoms in the RNA sequence. Predicting RNA secondary structure is an important field in bio-informatics. A physical method such as X-Ray diffraction and NMR spectroscopy methods are difficult, expensive and time consuming. Therefore a number of different computational approaches have been developed to simplify the determination of RNA molecule structure including dynamic programming, genetic algorithm, simulated annealing, harmony search algorithm and Particle Swarm Optimization.

Neethling and Engelbrecht [21] introduces a new set based particle swarm optimization algorithm (SET PSO) to optimize the structure of RNA molecules where they modeled structure prediction as an energy minimization problem. Here, original PSO is modified to work on sets and each set contain discrete elements which can be added and removed from the set. New addition, subtraction and distance operators were also introduced. Position and velocities of these particles are updated using these modified addition and subtraction operator. In the end solution provided by the set PSO is a set of elements that has been optimized by the objective function used in the experiments. The experimental results an eight benchmark sequence showed that the set

PSO was able to predict structures with relatively low mean free energies.

Geis and Middendorf proposed helix PSO[8] that uses modified PSO and the Vienna RNA package to determine the structure with minimum free energy. Helix PSO encodes a structure as a permutation of indexes from the set of all helices, H and fitness function is the combination of each structure’s free energy and the structure’s similarity to a centroid structure. Each particle i has an associated set of candidate target position Ti and for each t € Ti, a weight w(t)>0.The relative weight of a position in Ti determines the probability that Ti is chosen as a target. After each iteration each weight is decreased by multiplication with a parameter p, 0<p<1. A position that has weight less then threshold is removed from Ti, then the particle pbest and gbest are added to Ti. Particles move towards the target by performing a number of transpositions in the particle vector in such a way that the particle moves close to the target. The actual solution determined by the helix PSO is not the structure with the lowest energy but it counts all the different structures that appear throughout the experiments and picks the structure that appears most frequently as the final solution.

PSO in Gene Clustering

Microarray techniques offer new insights into the biology of a cell by enabling researchers to simultaneously measure the activity of many thousand genes. These help in understanding gene functionality, gene regulatory networks and drug discovery. However due to a large number of genes, the interpretation of such huge mass of data is a big challenge. The first step toward addressing this challenge is the use of clustering techniques, which identifies interesting patterns in the underlying data. Cluster analysis partition a given data set into groups based on specified features so that the data within the same group are more similar to each other than the data in different groups. Thus Gene clustering [28] is defined as the process of assigning gene to a cluster based on similarities in their activity patterns ( co-expressed genes). The genes with similar activity pattern must be groups together while genes with different activity pattern should be placed in distinct clusters because the genes with similar activity pattern are also functionally related and controlled by same mechanism of regulation (Co regulated genes). A number of standard clustering algorithm such as hierarchical clustering, K-means clustering and self organizing map (SOM) and Genetic Algorithm (GA) have been used to cluster gene expression data.

In 2003 Xiao et al [27] proposes hybrid SOM/PSO algorithm in which SOM is used to cluster the data set in the first stage and then in second stage PSO is used to refine the clustering process by optimizing the weight of SOM. Another hybrid algorithm of PSO with support vector machine is proposed by Alba et. al [1] to classify gene from cancer data set. A new version of PSO called as geometric PSO is evaluated for the first time in this work that uses a binary representation in hamming space. Author reported 100% classification rate.

In an other work, Li. et. al [14] combines PSO with GA and SVM. Here PSO/GA hybrid is adopted to select most important gene subsets which are then used to train the SVM classifier. The experimental results over three data sets shows improvement in cluster formation and thus enhances the classification accuracy.

Zhihua. et. al [30] modifies k-means by in cooperating particle pair optimizer and named it as PK-means clustering algorithm.This hybridization enhances the performance and convergence rate and the experiment shows that the PK-means outperforms k-means. Yarking Lam et al [12] enhances the performance of PK-means by introducing a concepts of cluster matching which is a two step process. In the step one the sequence of the cluster contained in the particle position is matched with the cluster contained in the position of the particle’s global best position on the basis of nearest distance. After this the sequences of the cluster contained in the current particle position are rearranged according to the matching results. The author reported that the proposed PSO-KM shows superior performance than PK-means and k-means in terms of compaction.

In memetic k-means algorithm MKMA [31] that uses Comprehensive Learning Particle Swarm Optimizer (CLPSO) based Memetic Algorithm (MA) to minimize the sums of the squared distances, by combining global search and local search. In each iteration CLPSO partitioned the particle swarm into a leader and populace group based on fitness value. They conduct experiments on two gene expression datasets and reveal that MKMA has consistently attended a better performance in comparison with K-means, fuzzy K-means & PK-means.

In 2012, Lam et al [13] proposes another algorithm XK-means that uses the concepts of exploratory vector along with hybridization of PSO and k-means. The exploratory vector is added to each centroid before a K-means iteration, as a result the exploitation level gets increased. From the results reported by the author it reveals that the proposed method is faster than the K-means and the PK-means algorithm and shows the best result in terms of cluster compactness and stability.

Sun et al [25] proposes the Quantum Behaved Particle Swarm Optimization (QPSO) algorithm for gene clustering. In this proposed work a Multi-Elistic strategy for Quantum Behaved Particle Swarm Optimization knows as Multi-Elistic QPSO is used to update the gbest position of the QPSO algorithm. As a result , the MEQPSO have a stronger global search ability and better overall performance than the original PSO. \

PSO in Phylogenetic Tree Construction

Phylogenetic [22] is the study of the evolutionary histories of living organisms, and represent the evolutionary divergences by finite directed (weighted) graphs, or directed (weighted) trees, known as phylogeny. Based on molecular sequences, phylogenetic trees can be built to reconstruct the evolutionary tree of species involved. In particular, the representation derived from genes or protein sequences is known as gene phylogeny, while the representation of the evolutionary path of the species is often referred as species phylogeny. A gene phylogeny is, to some extend, a local description. It only describes the evolution of a particular gene or encoded protein, and this sequence could evolve much more or less differently than other genes in the genome, or it may have a completely different evolutionary history from the rest of the genome. While in general the topology in phylogenetic trees represents the relationships between the taxa, assigning scales to edges in the trees could provide extra information on the amount of evolutionary divergence as well as the time of the divergence. However there are mainly two types of trees that can be found: a) rooted trees: those that have a single node from which all nodes are derived, and b) unrooted trees: those that do not originate from one clear node. The tree follows the standard graph theory notation where each species is represented as a node or a leaf, and the relationship between species is referred to as an edge or branch. The lengths of the branches represent the time estimate between the species.

a

a c

b c

b d

d e f g

Fig. (a) Fig. (b)

Figure 2: (a) Unrooted Tree, (b) Rooted Tree.

There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as : Distance-based methods and Character-based methods .

Distance-based methods estimate pairwise distances(dissimiliarity) prior to computing a branch-weighted phylogenetic tree. If the pairwise distances are sufficiently close to the number of evolutionary events between pairs of taxa, these methods reconstruct a correct tree (Kim and Warnow 1999). This assumption is true for many models of biomolecular sequence evolution, in which case distance-based methods give sufficiently accurate results (Li 1997). The main advantage of distance-based methods is their small time complexity that makes them applicable to the analysis of large data sets. Most commonly used methods are UPGMA and Neighbor-joining. The UPGMA [Unweighted Pair-Group Method using Arithmetic averages (Rohlf 1963).] was originally proposed for taxonomic purposes but can be used for phylogeny inferring with the assumption that the rate of nucleotide or amino acid substitution is the same for all evolutionary lineages. Compared to UPGMA, Neighbor-joining (Saitou and Nei 1987; Studier and Keppler 1988) NJ is designed to correct the unequal rates of evolution in different branches of the tree. NJ has a low time complexity and like other distance methods performs well when the divergence between sequences is low. Computationally, the tree generation by NJ is similar to UPGMA. When two nodes are linked, their common ancestral node is added to the reduced matrix and the terminal nodes with their respective branches are removed from it. Contrary to UPGMA, neighbor-joining does not produce a dendrogram (ultrametric distance) but an additive tree (additive distance).

A character-based method uses the aligned characters, such as DNA or protein sequences, directly during tree inference. This includes Maximum Parsimony and Minimum likelihood methods.

Maximum Parsimony infers phylogenetic trees by evaluating the possible mutations between sequences. In general terms, the aim of parsimony methods is to find the phylogenetic tree with minimum total length. That is the tree with the smallest number of evolutionary changes explaining the observed data. There are several variations of parsimony. The two simplest and most widely used variations are the Fitch (Fitch 1971). and Wagner (Farris 1970). parsimonies. The Fitch parsimony uses no constraints at all and allows any state to transform directly into any other state, whereas the Wagner parsimony uses a minimum of constraints on permissible character-state changes and assumes that any transformation from one character state to another implies a transformation through any intervening states, as defined by the ordering relationship. The Wagner method assumes that characters are measured on an interval scale; thus, this method is appropriate for binary, ordered multistate and continuous characters. The Fitch method allows unordered multistate characters (e.g. in nucleotide or protein sequences). Both methods permit free reversibility that is the change of a character state in either direction is assumed to be equally probable, and character states may transform from one state to another and back again. A consequence of reversibility is that a tree may be rooted at any point with no change in tree length.

The maximum likelihood approach for inferring phylogenies from sequence data was introduced by Felsenstein (1981). It assigns quantitative probabilities to mutational events, rather than merely counting them. This method compares possible phylogenetic trees on the basis of their ability to predict the observed data. The tree that has the highest probability of producing the observed sequences is preferred. Similarly to maximum parsimony, maximum likelihood reconstructs ancestors at all nodes of each considered tree, but it also assigns branch lengths based on the probabilities of mutations. For each possible tree topology, the assumed substitution rates are varied to find the parameters that give the highest likelihood of producing the observed sequences. The main obstacle to the widespread use of maximum likelihood is computational time. Algorithms that find the maximum likelihood score must search through a multidimensional space of parameters. This makes the solution of large-scale problems (>100 sequences) extremely time consuming. Maximum likelihood estimation may be subject to systematic errors. This happens if the model of evolution used to evaluate the likelihood of giving trees does not reflect the actual evolutionary processes.

The Lv et. al [18] proposed a novel algorithm for Phylogenetic Tree Reconstruction in which a Discrete Particle Swarm Optimization (DPSO) is used to select the best tree from the population. In the proposed algorithm, Initially the fitness value of each particle is calculated in the population and individual with maximum fitness value is then used for the phylogenetic tree construction. Once the tree is constructed, the population updating and branch adjustment is performed. In the population updating the position and velocity is updated using DPSO position and velocity update equations. In the next step to adjust the branch of the tree, comparison is done. If the distance between two nodes is greater than or equal to 2D (D refers to the distance between two Sequences) then separate the branch otherwise combine the branch. These updation continues until the phylogenetic tree is not optimized. The DPSO algorithm gives optimized results even if initial population is changing. The DPSO algorithm is applied on 25 sequences problem which involve sequences of the chloroplast gene rbcL from a diversity of green plants and Experimental results reveals a satisfactory result when compared to other traditional algorithms.

PSO in Energy Minimization of a Molecule

Molecular modeling can be considered as an application of computerized techniques to analyze molecules and predict molecular, chemical and biochemical properties. Various functions of molecular modeling include structural retrieval generation, visualization superposition, alignment, calculation of molecular properties, dynamic simulation, confirmation, search and energy calculation and minimization. Knowing the stable conformation of a molecule is important because it allows us to understand its properties and behavior based on its structure, it is not necessary that when the molecule is initially built it correspond to one of the stable conformers. It has been found that the lowest energy structure is related to the global minimum of the molecular potential energy function. So energy minimization is usually carried out to determine a stable conformer. Molecular energy minimization is one of the most challenging, unsolved problems in molecular biophysics and now a day’s many researches from computer science and optimization have paid close attention to this problem.

Many algorithms have been proposed for solving this problem these includes simulated annealing, GA, branch and bound etc. Rong [23] proposed FM_PSO that combines filter method with particle swarm optimization that improves the ability of exploration and exploitation. Author also incorporates divide and conquer method to speed up the convergence rate. In the proposed method, at the initial stage the whole swarm is divided into 2k sub swarms by divide and conquers method and the best individual in the sub swarms is updated according to the filter technique. As the generation grows, two sub swarms will merge into one until all sub swarms being a whole space .Feasibility and effectiveness of the proposed FM_PSO is tested on 8 benchmark functions and then applied to predict the structure of macromolecule. Results show that FM_PSO is able to solve the high dimensional situation better than the branch and bound method and SPSO algorithm.

PSO in Protein Modelling

Protein is an organic heteropolymer where several amino acids are linked together by peptide bonds. The protein primary structure will fold into a three dimensional configuration to perform its function. This folded functional state of the protein is called the native state. Functional characterization of proteins is one of the most frequent tasks in biology and can be accomplished by determination of tertiary structure of the desired protein. The tertiary structure is determined by either X-ray crystallography or NMR. According to the data collected it has been found that from 51 billion known nucleotide bases more than 46 million individual sequences has been produced among them only 35,701 have their 3D structures solved experimentally using X-ray and NMR because of time consuming and complicated nature of these techniques. Also many proteins are too large for NMR and cannot be crystallized. So computational approaches also termed as Protein Modelling act as a substitute. Several computational methods can be used to fill the gap between sequence and structure space. These approaches can be classified into two broad classes: comparative modeling and De novo(ab initio) modeling.

Comparative Protein modeling uses previously solved structures as a starting point or templates and a scoring function to assess the compatibility of the sequence to the structure to yield possible 3D model. These methods may also split into two groups: Homology modeling and Protein threading. Homology modeling is a prediction of 3D structure of a target protein from the amino acid sequence of a homologous protein for which an X-ray or NMR structure is available. This is the most used and reliable theoretical methods for predicting protein structures out of a sequence. Threading or fold recognition is the method by which a library of unique or representative structures is searched for structural analogies to the target sequence and is based on the theory that there may be only a limited number of distinct protein folds.

De Novo, protein modeling methods seek to build 3D protein models from scratch. These methods assume that the native structure corresponds to the global free energy minimum, accessible during the lifespan of the protein and attempts to find this minimum by an exploration of many conceivable protein conformations. The two key components of de novo methods are the procedure for efficiently carrying the conformational search, and the free energy function used for evaluating possible conformations. Two basic models namely Detailed models and Hydrophobic-Polar models have been developed. Detailed models consider the interactions between all atoms of the protein sequence. Therefore, the search space is huge, taking into consideration an overwhelming number of possible degrees of freedom and interactions between the different atoms. The energy function is usually based on molecular mechanics and force field components such as bond lengths, bond angles, dihedral angles, van der Waals interactions, electrostatic forces, etc. Hydrophobic-Polar (HP) models represent each amino with all of its atoms as one bead labeled as either hydrophobic (H) or polar (P). According to this model, beads lie at points defined by a lattice according to some chosen algorithm such that the most stable structure is the one with the hydrophobic amino acids lying in its core. The underlying concept is that hydrophobic amino acids tend to escape from having contact with the solvent and hence tend to move inside the structure whereas the polar ones remain on the outside. The main energy function used in this model is the total number of the hydrophobic interactions between the amino acids and the goal is to have a lattice with minimum energy, i.e. with maximum number of H-H contacts. HP models can be 2-dimensional (2D) or 3-dimensional (3D). The problem of predicting protein structures is intractable. Hence, heuristic and metaheuristics algorithms have been reported for finding good sub-optimal solutions, among them, in the next section, application of PSO to solve protein structure problem by various researchers have been discussed in detail.

Liu [17] applies the PSO algorithm to search the ground state of toy model which is the simplest model to represent the protein structure. Experiments were conducted on both artificial data and a real protein data and it is found that PSO is effective to search for ground state of toy model. In 2007, Call [3] first time introduces PSO to perform global optimization of minimum structure search for chemical systems. Author introduces few modifications in the original PSO. First is that it uses two types of velocities, one for each units center of mass and the other for each unit angle. Both of these have their own Vmax. Another novel feature is that the best solution seen by the particle is sometimes not updated with a newly discovered best solution seen in the current iteration. Flexible initial population containing fragmented randomly generated linear and planar structures, enforcement of user defined minimum\maximum distance constraints between atoms, measure of the similarity of structures using a distance metric are few more novel features added by the authors .Simulation results on three chemical structures demonstrate the efficiency of PSO to effectively find global minimum structures. PSO requires small population size and converges fast as compare to simulated annealing (SA) and genetic algorithm (GA) . Meissner et. Al [20] introduces Constriction type PSO (CPSO) as an optimization technique for protein structure prediction. In this research work a course grained "beads on string" backbone model is used and every particle in a swarm represents a distinct backbone conformations. Root mean square deviation (RMSD) is used as a fitness function and some other scoring function is also applied to efficiently measure the fold similarly. Simulation results show that PSO is capable of optimizing backbone geometries and generates a good solution in refolding studies yielding near native structure for two small sample proteins. Zhang and Li [29] proposes a toy model based PSO for the protein folding problem. Their proposed architecture consists of three parts: - An elitist part, an exploitative part and an explorative part. By incorporating local search and global search author proved that the proposed algorithm is effective to search for the native state of proteins with the lowest free energy. In the work proposed by Datta et. al [6] hybridization of artificial neural network with particle swarm optimization is done to predict tertiary structure of protein using Ab-initio approach for global minimization of energy function. Here three layered ANN trained with back propagation algorithm, is used to predict the side chain dihedral angle while PSO is applied to optimize CHARMM energy function which is used to find main chain dihedral angle. Author shows that this novel algorithm outperforms all other classical techniques in 80% cases and also reduces the dimensionality of the search space. Lin et. al [15] proposes an efficient hybrid Taguchi genetic algorithm for solving protein folding problem in 2D HP model. This algorithm combines the global exploration capability of genetic algorithm with the strong exploitation capability of Taguchi method. PSO is used to improve the mutation mechanism. The proposed algorithm is tested on 2D benchmark HP protein and shows superior performance in comparison with genetic algorithm, ant colony algorithm, Monte Carlo and tabu search with genetic algorithm. In this research work Kanj et. al [10], PSO is applied for protein structure prediction problem in 3D HP model. The proposed algorithm starts with a small set of population representing solution and then gradually explores the search space to find out structures with minimum energy. The algorithm is tested on two sets of benchmark sequences of different lengths and shows that it outperforms the existing algorithms. Bauto et. al [2] extends binary PSO to predict the tertiary structure of proteins in lattice modal. They introduce a new discrete PSO and Roulette PSO which uses the roulette wheel structure of the GA. Simulation results on six proteins with three lattice models and two folding encoding indicates that the new algorithm performs efficiently and is able to find conformations of minimum energy. In order to reduce the computation time of the protein folding problem, Hernandez et. al [9], implemented PSO in distributed computing environment and named it as parallel PSO. While predicting the protein 3D structure of minimum energy, parallel PSO consider structural restriction of the protein where the conformation uses the representation of torsion angles of the skeleton and the side chains. Energy is calculated using the energy empirical function ECEPP/3 and the result shows that the proposed algorithm is comparable to the existing algorithm. In order to enhance the performance of protein structure prediction problem in 3D HP model, Cheng Jian Lin et. al [16] proposed a hybrid genetic algorithm based PSO(HGA-PSO).Here in the first stage genetic algorithm is applied and PSO is used as an mutation operator. This encourages the particles to move towards their own best positions. Simulation results reported by the author show that HGA-PSO shows superior performance than existing evolutionary algorithms. In PSO-SQP for protein folding , Wang [26] , the training process is divided into two phases. In the first phase particle is trained with standard PSO and then in the second phase sequential quadratic programming (SQP) is used to fine tune the local search. SQP method divides the problem into a sequence of sub problems each of which optimize a quadratic model of the objective subjected to linearization of the constraints. The experimental results on four chains of different lengths shows that the PSO – SQP outperforms both GA and PSO. In this novel algorithm [4], levy flight with PSO is used to solve the protein folding problem. 3D AB off lattice model of protein folding is used. Levy flight is a local search method in which step length is chosen from a probability distribution with a power-low-trail based on chaos theory. For experimentation first of all Fibonacci sequence of hydrophobic and hydrophilic amino acids are generated randomly. The algorithm is run up to 2000 iterations and population size is set at 100. 50 independent runs of the proposed algorithm on four real protein sequences from the PDB database shows that it outperforms other existing algorithms. Mansour et. al [19] introduces PSO with repair algorithm to solve PSP in the 3D HP model. The proposed algorithm starts with the random population of particles and each particle is evaluated using an energy function of 3D HP lattice structure. At every iteration swarm is updated using a velocity update equation of PSO with a certain probability (Rate). If two or more amino acids lie at the same point on the cubic 3D lattice than collision occurs and that particle is termed as invalid. This invalid particle is repaired by using a repair algorithm. The algorithm local searches for an alternative empty location for the amino acid which causes the collision. If none is available then it tries to find out the previous amino acids whose location can be modified. If more than three amino acids have been searched or if none can be modified then it is assumed that the particle cannot be repaired and the initial input particle is returned. The new particles replace the old particle if its energy value is lower or equal to the energy of old particle. The experimental result reveals that the proposed PSO perform better when tested for protein of 27 and 64 amino acid length. Kondov [11] applied PSO for predicting protein structure based on all atom force field. Four variants of PSO namely classical and linear, with and without inclusion of periodic boundary conditions (PBC) is investigated for a series of peptides with 28 to 64 optimization dimensions. Author reported that although classical update scheme yields faster and accurate structure prediction but the inclusion of PBC‘s improves the accuracy and efficiency of both update schemes for all peptides. In this paper performance of synchronous and asynchronous parallel PSO is also investigated and found that the asynchronous parallel PSO is better for any number of workers.

Conclusion:

Bioinformatics is the application of computer technology to the management and analysis of biological data. This field is data driven and aims at uncovering the knowledge hidden in the mass of data so as to obtain a deep insight and understanding into the fundamental biology of organisms. As biological data is growing exponentially, there is a need for rapidly surveying the published literature that allows the researchers to conduct informed work, avoid repetition and generate new hypothesis. Since the introduction of PSO to this field, many variants of PSO have been designed and applied to many problems of Bioinformatics, still they are few. So the aim of this paper is to present a review on application of Particle Swarm Optimization in Bioinformatics and to inspire research and further development on new applications and new concepts in new trend setting directions and in exploiting PSO.

The outcome of this research demonstrates the need for improving the existing tools which are already being applied to solve Bioinformatics problems. Also there are some issues in PSO related to Bioinformatics. Firstly the basic velocity updating scheme in PSO is common to all applications so design of problem specific operators is needed. Secondly PSO is parameter dependent and these parameters require extensive experimentation so that the appropriate range of values can be identified for different Bioinformatics tasks which is very tedious and time consuming. Lastly, PSO and its variants involve a large degree of randomness and different runs of the same program may yield different results so it is necessary to incorporate problem specific domain knowledge in the SI tools to reduce randomness and computational time, so the current research should progress in this direction as well.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now