MicroRNA-Disease Predictions Based On Genomic Data

Gene Ontology is a structured library of concepts related with one or more gene products through a process called annotation. Association Rules that discovers biologically relevant and corresponding associations. In the existing system, they used Gene Ontology-based Weighted Association Rules for extracting annotated datasets. We here adapt the MOAL algorithm to mine cross-ontology association rules. Cross ontology rules to manipulate the Protein values from three sub ontology’s for identifying the gene attacked disease. It focused on intrinsic and extrinsic values. The Co-Regulatory modules between microRNA, Transcription Factor and gene on function level with multiple genomic data. The regulations are compared with the help of integration technique. Iterative Multiplicative Updating Algorithm is used in our project to solve the optimization module function for the above interactions. Comparing the regulatory modules and protein value for gene and generating Bayesian rose tree for the efficiency of our result.


INTRODUCTION
MicroRNAs (miRNAs) and transcription factors (TFs), as two vital gene regulatory molecules in multicellular organisms, share a common regulatory logic. MicroRNAs are a family of small, non-coding RNAs that regulate gene expression in a sequencespecific manner, which participate in the regulation of numerous cellular process at the posttranscriptional level, such as cancer progression. TFs are proteins that control gene regulation by binding to coregulatory elements in the gene promoter region at the transcriptional level. By activating or repressing their target genes, TFs can regulate the global gene expression program of a living cell, and form transcriptional regulatory networks. However, it's still a challenge to elucidate coregulation mechanisms between miRNAs and TFs.
Recently, researchers studied the co-regulation of miRNAs and TFs by finding out their shared downstream targets. The method adopts probabilistic models and statistical tests to measure the significance of the shared targets between the regulators, and to remove the insignificant co-regulating interactions that occurred by chance. Gene enrichment analysis was used in to identify significant coregulation between the transcriptional and posttranscriptional layers. They found that some biological processes emerged only in co-regulation and that the disruption of co-regulation may be closely related to cancers, suggesting the importance of the co-regulation of miRNAs and TFs which proposed a rule based method to discover the gene regulatory modules and their target genes based on the available predicted target binding information. These work provides a good resource for exploring the regulatory relationships or identifying the network motifs.
However, target prediction basedon sequences have high rate of false discoveries, which affect the quality of the discoveries of the above mentioned methods. It would be ideal if expression data can be used to refine the discoveries. Identification of modular structure of biological networks has greatly advanced our understanding of However, little is known about the modules that exist in miRNA-TF-gene regulation systems, and even less is known about these modules International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 @ IJTSRD | Available Online @ www.ijtsrd.com | Volume -2 | Issue -3 | Mar-Apr 2018 Page: 1647 role in specific biological processes and key regulation assemblies. Several studies have made efforts to uncover miRNAs and mRNA modules on extent, it is impossible to detect highly credible miRNAs modules.
Data mining is the process of analyzing hidden patterns of data in order to different perspectives for categorization into useful and effective information.
The data information's are collected and gathered in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue. Data mining is otherwise known as data discovery and knowledge discovery.

MOAL Algorithm
MOAL (Multi ontology data mining at all levels) algorithm for mines the cross ontology relationship between the ontologies. MOAL algorithm to mine cross-ontology association rules, i.e. rules that involve GO terms present in the three sub-ontologies of GO. By using collaborative filtering, user get the details about the gene id for cross ontology technique we have to compare the protein value and getting BP& MF value, or MF&CC value or CC&BP value getting the gene disease and symptoms for user requirements.

Molecular Function
Molecular function activities that occur in molecular level, "catalytic activity" or "binding activity". GO molecular function that perform the actions which specify where, when, or in what context the action takes place.The activities are performed by assembled complexes of gene products corresponding to the activities that performs individual gene products.It is easy to confuse a gene product name with its molecular function. Accurately infer miRNAs functional regulation. Meanwhile, these methods have not considered TFs regulation and the modules only contain miRNAs and genes.

Cellular Component
Cellular Component is the one which describes a location that are related to cellular compartments along with structures, which are occupied by a macromolecular machine when it carries a molecular function. Gene products which are described by biologists in two major ways and they are : 1. Cellular Structures and 2. Stable macromolecular complex.

Biological Process
A biological process is a series of events which describes one or more organized assemblies of molecular functions.Distinguition of Biological process and Molecular function.

Problem formulation
To identify miRNA-TF-gene co-regulatory modules, we design an objective function with three components (redFigure As mentioned above, the optimization function consists of a joint NMF, a regularized term for three prior networks and a sparse penalized term. Here we provide the final optimization function: min W,H1,H2, Where expression matrix X1,2,3 is decomposed by basic matrix W with size of N × K and coefficient matrix H1,2,3 with size of K ×M. The following subsections will provide the details as well as the solution of objective function.

OPTIMUM SOLUTION
We applied the MOAL algorithm to identify miRNA-TF-gene modules by integrating multiple independent data sources.

Intrinsic
Normal protein value of human is compared to the cross ontology value.Cross ontology values are BP&MF, CC&BP, CC&MF. If the protein value is lesser than the cross ontology value then the condition is said to be intrinsic.

Extrinsic
Normal protein value of human is compared to the cross ontology value.Cross ontology values is the interactions of GO terms. If the protein value is higher than the cross ontology value then the condition is said to be extrinsic.

Choose of parameters
The proposed SNCoNMF algorithm requires setting of several parameters as described in the pseudo code.
Here it's important to decide the value of the reduced dimension of matrix factorization K. According to researches, a miRNA cluster analysis which required miRNA cluster data from the miRBase articles. As a result, we obtained about 20 clusters containing miRNAs range from 2 to 50. So in this paper we set the K to 20, approximately equals to the number of miRNA clusters represented in our data. Meanwhile, we set parameters λ1,λ2,λ3,γ1 and γ2 to 0.01, 0.01, 0.01, 20, 10, respectively. Due to the lack of TFs, we set the threshold T to 2 for TFs by conducting a series of tests, while T is set to 3 for miRNAs and genes.

Module character and size distribution
We performed the proposed SNCoNMF algorithm on breast cancer dataset and obtained 20 miRNA-TFgene coregulatory module which are composed of by a set of miRNAs, TFs and genes that are denoted as miRNA modules, TF modules and gene modules, respectively.TF-gene modules identified in this paper have an average of 5.5 miRNAs, 2.5 TFs and 28.15 genes per module. The average density of all module is 0.0176. Meanwhile, we calculated the average miRNA-gene expression correlation and TF-gene expression correlation among all modules as described in section.
In addition, to verify feasibility of our method, we run the SNMNMF algorithm on our datasets which TFs are treated as genes. In SNMNMF, the average miRNAs, genes number and TFs are 5.6 26.4 and 0.55, respectively, which genes and TFs are less than ours, especially the TTs. It demonstrates that SNCoNMF can effectively discover miRNA-TF-gene co-regulatory modules. Meanwhile, due to more genes in per module, SNCoNMF bears a less average module density.
In addition, experimental results show that all of modules from SNCoNMF are enriched in at least one GO-BP term and only one module Gene Ontology biological process term clusters for genes in each modules having 5 genes at least. We found that 45% (9/20)

SYSTEM ARCHITECTURE
The cell's activity is organized as a network of interacting modules which are the set of genes coregulated modules that respond to different conditions. We adapt a probabilistic method for identifying the regulatory modules from genomic data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'. Admin will able to Update the Gene and maintain the website using specified modules. User can Search, Select the gene and View the Gene and precaution details based on Gene ID.Comparing the regulations between miRNA-TF interaction, TFgene interactions and gene-miRNA interaction with the help of Integration Technique. Relation among Co-Regulatory modules is identified. Protein value for gene is generated using Bayesian rose tree for efficiency of the result.
We here adapt the method called Saccharomyces cerevisiae expression data set, which shows the ability to identify functionally coherent modules and their correct regulators. We present microarray related experiments for supporting three novel predictions, that suggest regulatory roles of previously uncharacterized proteins.

Fig.4.System
Architecture of Disease associations.
In this system, we applied an integrated framework which will indicate the gene regulatory modules from the cells cycle. Incorporating multiple biological data sources, including gene expression profiles, gene ontology mechanism, and molecular interaction. In human body there exists 846 genes which plays some putative roles in the regulation of cell cycle, 46 transcription factors and 39 gene ontology groups are identified and data will be recorded. We reconstructed regulatory modules to ensure the underlying regulatory relationships. Four regulatory network motives that are identified from the interaction of gene modules.

MERITS AND DEMERITS
The impacts and drawbacks of existing system are,

Advantages
 Evaluates the annotation consistency.  Avoids possible inconsistent or redundant annotations.  Classical association rules mining algorithms used to identify victim affected by cancer.

Disadvantages
 CARM Algorithms are not able to deal with different sources of production of GO annotations.  Candidate rules with low Information Content.  A large amount of information is usually missed.

Merits
The main advantage of our proposed system are,  Medical description for particular gene disease can be easily accessible.  Quick retrieval of data.  Identifying the data is less complex.  Computed medical description for safeguarding the generations.

CONCLUSION
Relevant progresses in biotechnology and system biology are creating a remarkable amount of biomolecular data and semantic annotation. Biomolecular data increase in number and quality, but are dispersed and only partially connected. Integration and mining of these distributed and evolving data and information have the high potential of discovering hidden biomedical knowledge useful in understanding complex biological phenomena. Normal (or) pathological, and ultimately of enhancing diagnosis prognosis and treatment; but such integration poses huge challenges. Our work has tackled them by developing a novel and generalized way to define and easily maintain. updated and extend an integration of many evolving and heterogeneous data sources. Our approach proved useful to extract biomedical knowledge about complex biological processes and diseases.