PERFORMANCE OF ASSOCIATION RULES FOR DENGUE VIRUS TYPE1 AMINO ACIDS USING AN INTEGRATION OF TRANSACTION REDUCTION AND RANDOM SAMPLING (TRRS) ALGORITHM

HTML Full Text

PERFORMANCE OF ASSOCIATION RULES FOR DENGUE VIRUS TYPE1 AMINO ACIDS USING AN INTEGRATION OF TRANSACTION REDUCTION AND RANDOM SAMPLING (TRRS) ALGORITHM

D. Kerana Hanirex ^*, K. P. Thooyamani and V. Khanaa

Bharath University,Chennai-73, Tamilnadu, India.

ABSTRACT: Association rule mining is the recent research area in data mining. Frequent Itemset Mining techniques is one of the prominent techniques in pattern mining. In this system frequent itemset mining is used to find the frequent amino acids patterns in Dengue Virus Type1 data set. This system uses an integration of Transaction Reduction and Random Sampling (TRRS) approach to identify the frequent patterns in Dengue Virus Type1 amino acids sequence. Our system reveals the association between the amino acids. Our System first identifies the number of amino acids sequences suitable for each transaction and finds the number of association rules using Transaction Reduction and Random Sampling methods by varying the sample size. Experimental results show that the performance of our proposed Transaction Reduction and Random Sampling Algorithm(TRRS) works efficiently when compared to Apriori algorithm, FP Growth algorithm, Two Dimensional Transaction Reduction(TDTR) Algorithm Improved TDTR algorithm, Set Oriented Mining (SETM) algorithm and Improved SETM Algorithm (ISETM) in terms of number of association rules generated and the time taken to generate the association rules.

Keywords:

Association Rule Mining, Association Rules, Frequent Item set, Dengue Virus Type1, Apriori, FP Growth, Two Dimensional Transaction Reduction (TDTR), Random Sampling, Transaction Reduction and Random Sampling Algorithm (TRRS)

INTRODUCTION: Mining association rule is the recent research area. Data mining is used to find the hidden pattern and useful information from the data base ^{1, 9}. Association rule mining is one of the major task of the Data mining ^{2, 3}. The other data mining tasks include clustering ^{37, 30}, classification ^{4, 21}. Frequent itemset mining is not only used in market basket analysis but also used in bioinformatics such as gene expression data and protein analysis ^{10, 23}.

The algorithm which is developed for market basket problem can also be applied to solve various bioinformatics problem such as analysing frequent patterns in amino acids. Here each transaction is identified by the sequence of amino acids. Various algorithms have been proposed to find the frequent itemsets but it differs in its computational efficiency. Frequent itemsets can be converted into association rules that can be used in further applications.

Association Rule Mining: Association rule mining problem can be done in 2 steps. First find frequent itemsets and then generate rules from the frequent itemsets. A brute-force method for frequent itemset mining is to generate support and confidence measure for all generated association rules. This method is expensive because the search space is exponential to the number of itemsets present in the database ². Most algorithms that are used for association rule mining are differentiated by its search space and their computation of support value. At each stage it generates candidate itemsets and calculate its support count and remove its itemsets that are having less support which is infrequent in the database. Search space traversal may be either depth first search or breadth first search. In breadth first traversal, itemsets are generated starting from the singleton sets. In depth first traversal, it uses divide and conquer strategy. Apriori algorithm uses breadth first approach and FP Growth algorithm uses depth first approach. If the data is not fit into the memory it takes each transaction from the database one by one. Various optimization techniques such as sampling and partitioning are used to fit the transactions from the database into memory. FP Tree algorithm uses compressed tree structure that are memory resident ¹⁸. Éclat⁴³is an algorithm which uses depth first approach.

Association rule is of the form X=> Y where X and Y are the itemsets and X∩Y =Ø. This implies that a transaction which contains X also contains Y. X is the antecedent of the rule and Y is the consequent of the rule.

Confidence and support are the two important measures of Association rule mining. Support determines how frequently the rules occur in the database.

Support (X=>Y, D) = Support (XUY, D)………(1)

Confidence of an association rule X=>Y is the ratio of total occurrences of X and Y to the total number of occurrence of X.

Confidence (X=>Y,D) = Support(XUY,D) /support (X, D) ………………….(2)

Association Rule Mining is used to generate a set of Association Rules. Various kinds of interesting measures have been proposed ³^8,¹⁴ for biological datasets. The other interesting measures are lift and coverage. Lift describes the ratio between the observed support for X=>Y to the expected support value when X and Y are independent. The coverage of an association rule states that how often the rule is present in the database.

Related work: Researchers proposed various algorithms and methods to find the frequent itemset. Apriori algorithm is the well-known standard algorithm for association rule mining which requires large number of data scans and candidate generation. FP Growth algorithm is used to find the frequent itemset generation without candidate itemset generation. FP Growth is the fastest algorithm than Apriori which is based on prefix tree representation can save memory ¹⁹. Hashing technique ²² can also be used to improve the way of finding frequent itemsets. FP-Streaming ¹⁵ and Regression parameter ¹¹ has also been proposed to find the frequent itemsets. The paper ^5, ¹⁶uses Genetic algorithm for association rule mining. Genetic algorithm uses fitness function and genetic operators such as selection, crossover and mutation to find the frequent itemsets.

Soumadip Ghosh et al., proposes ³⁶ genetic algorithm to find frequent itemsets. Apriori algorithm has certain limitations ³ such as producing large number of redundant patterns. This redundant rules can be removed by various techniques such as by reducing search space, considering either maximal frequent itemsets or top-k frequent itemsets ^8, ^40, ¹⁷. Cai R, Hao Z, Wen W, et al., ¹³ proposes kernel density estimation measurement to reduce the irrelevant rules. Closed itemsets has also been considered for generation of association rules ³⁴.

Bioinformatics: Frequent itemset mining is used in Bioinformatics to analyze gene. It is used to analyze co-occurring frequent annotation patterns in molecular biology. It can also be used in cross ontology mining as well as gene ontology ^{24, 32, 6}. Frequent items mining can also be used in finding structural patterns or motif discovery in biomolecules ^31, ³⁹. Association rule mining can be used in analysis of gene expression data ^12, ⁷. It has also been used to identify the strong factors associated with the particular diseases ³³. Frequent sub graph can be identified from the molecular graph ⁴¹. Association Rules can be used to build a classifier which is more accurate to solve biological problems ^20, ²⁵.

Data Collection: Dengue virus (DENV) belongs to the family Flaviviridae and Genus Flavivirus. There are four serotypes namely (DEN 1-4) ³⁵. At present there are no proper vaccines for diseases like Dengue, Ebola and Anticancer drugs. This system will find the hidden patterns in polyprotein Dengue virus type1 DNA sequence and find the dominating amino acids using data mining techniques and to improve the quality of finding drugs for the pharmacists.

This system will increase the demands in all pharmaceutical companies through innovation and to treat the patients carefully. The association between dominating amino acids will be useful for the drug designers to develop the antibiotics for the virulent diseases caused by viruses such as ebola, dengue and anticancer drugs. This proposed system uses protein dengue virus type1 datasets from NCBI (National Centre for Biotechnology Information). It uses GenBank: AB189120.1 which consists of 3392 amino acids.

Sample Amino Acids sequence for GenBank: AB189120.1

MNNQRKKTGR PSFNMLKRAR NRVSTVSQLA KRFSKGLLSG QGPMKLVMAF IAFLRFLAIP PTAGILARWG SFKKNGAIKV LRGFKKEISN MLNIMNRRKR SVTMLFMLLP TALAFHLTTRGGEPHMIVSK QERGKSLLFK TSAGVNMCTL IAMDLGELCE DTMTYKCPRI TETEPDDVDCWCNATETWVT YGTCSQTGEH RRDKRSVALA PHVGLGLETR TETWMSSEGA WRQIQKVETWALRHPGFTVM ALFLAHAIGT SITQKGIIFI LLMLVTPSMA MRCVGIGNRD FVEGLSGATW

FIG. 1: AMINO ACIDS SEQUENCE FOR DENGUE VIRUS TYPE1 DATASET

Data Preprocessing: To improve the quality of data, it requires pre-processing techniques. Data mining task includes data cleaning, data transformation, normalizing, data aggregation and discretization. This system uses data transformation as a pre-processing techniques which transforms the data suitable for analysis.

Selecting suitable number of amino acids sequence for a transaction: Our first part of the research work is to identify the suitable number of sequences. Polyprotein Dengue Virus Type 1 Dataset consists of sequence of 3392 amino acids. Our research work first identifies how many amino acids sequence we can take together for a transaction. For the analysis, amino acids are taken as sequence of 10,11,12,13,14,15,16,17,18,19 and 20 from the data set. The number of association rules generated for each sequence T10, T11, T12, T13, T14, T15, T16, T17, T18, T19 and T20 are measured by varying confidence and support measure. Association rules are measured using Apriori algorithm in R tool. The following table shows the number of rules generated for some of the amino acids sequence.

The following table Table 1, 2, 3. shows the number of Association Rules generated for different aminoacids sequence by taking support=0.1 and varying confidence.

TABLE 1: NUMBER OF ASSOCIATION RULES GENERATED FOR DIFFERENT AMINOACIDS SEQUENCE BY TAKING SUPPORT = 0.1 AND BY VARYING CONFIDENCE FROM 0.9 TO 0.1

Confidence	T10	T12	T14	T16	T18	T20
0.9	-	-	-	-	-	3
0.8	-	-	-	-	58	182
0.7	-	-	18	84	211	483
0.6	-	12	59	226	440	751
0.5	5	26	105	261	483	840
0.4	11	34	113	292	518	869
0.3	16	40	116	294	521	872
0.2	16	40	116	294	521	872
0.1	16	40	116	294	521	872

TABLE 2: NUMBER OF ASSOCIATION RULES GENERATED FOR DIFFERENT AMINOACIDS SEQUENCE BY TAKING SUPPORT = 0.2 AND BY VARYING CONFIDENCE FROM 0.9 TO 0.1

Confidence	T10	T12	T14	T16	T18	T20
0.9	-	-	-	3	220	802
0.8	-	-	21	227	1687	4645
0.7	-	-	302	1500	5687	10807
0.6	16	75	1133	3909	11113	18033
0.5	87	321	2134	5702	14071	22610
0.4	268	572	2609	6631	15916	24985
0.3	331	693	2895	7076	16580	26075
0.2	381	733	2976	7196	16777	26279
0.1	385	740	2980	7197	16778	26279

TABLE 3: NUMBER OF ASSOCIATION RULES GENERATED FOR DIFFERENT AMINOACIDS SEQUENCE BY TAKING SUPPORT=0.3 AND BY VARYING CONFIDENCE FROM 0.9 TO 0.1

Confidence	T10	T12	T14	T16	T18	T20
0.9	-	-	-	-	7	38
0.8	-	-	-	20	229	684
0.7	-	-	58	262	842	1783
0.6	1	17	213	721	1788	2991
0.5	24	59	398	925	2075	3505
0.4	61	110	447	1071	2282	3730
0.3	68	126	480	1093	2315	3785
0.2	71	128	484	1099	2326	3793
0.1	71	128	484	1099	2326	3793

The above tables show that T20 sequence reveal association rules for higher confidence and for increasing support values.

The following Fig. 2 represents the number Association Rules generated for different amino acids sequence by taking support=0.3 and by varying confidence.

FIG. 2: NUMBER OF ASSOCIATION RULES GENERATED FOR DIFFERENT AMINO ACIDS SEQUENCE FOR SUPPORT =0.3

The above graph reveals that T20 sequence exhibits association rules for all confidence measure. Hence 20 amino acids sequence is taken for each transaction which will be considered for our further analysis.

In further analysis, the number of association rules generated based on T20 sequence by varying different confidence and support measure. The following graph Fig. 3 shows the number of association rules generated for T20 sequence by varying different confidence and support measure.

This data set consists of minimum of 8 different items and a maximum of 15 different items in a transaction. The following table Table 4 shows the number of transactions that contain different items of different size or length.

TABLE 4: MOST FREQUENT ITEMS IN THE SPARSE MATRIX AND ITS NUMBER OF OCCURRENCES

Items	L	G	V	S	E	other
Number of occurences	143	140	133	126	124	1356

TABLE 5: LENGTH DISTRIBUTION OF DIFFERENT ITEMS IN A TRANSACTION

Item Length	8	9	10	11	12	13	14	15
Number of Transactions	4	3	20	33	57	33	13	7

FIG. 2

FIG. 3: ANALYSIS OF DENGUE VIRUS AMINO ACIDS DATA SET

From the above table Table 5, it reveals that there are 57 transactions of length 12 different items in the dataset. Also there are 7 transactions having itemset of length 15. The following figure Fig. 4 shows the frequencies of amino acids sequence in the dengue virus type 1 amino acids data set.

FIG. 4: ITEM FREQUENCIES OF AMINO ACID SEQUENCE

Apriori Algorithm: The Apriori Algorithm is the most well-known association rule algorithm. It uses downward closure property. This algorithm is based on largest item set property which states that “Any subset of a large item set must be large” ^{1, 2}.

In our earlier research work, the Apriori algorithm is implemented for Dengue Virus Type1 Dataset with 777 aminoacids ²⁶. This paper implements Apriori algorithm for GenBank: AB189120.1 which consists of 3392 amino acids. This Apriori algorithm is implemented using R tool.

FIG. 5 NUMBER OF ASSOCIATION RULES GENERATED USING APRIORI ALGORITHM BY VARYING SUPPORT AND CONFIDENCE MEASURE

From the above Fig. 5 we can understand that from the confidence value 0.1 to 0.5 the system reveals similar number of association rules and above the confidence value 0.5 the number of rules generated gets decreased. The above graph shows that Apriori algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values.

FP Growth Algorithm: FP Growth algorithm is used to find the frequent itemset generation without candidate itemset generation. FP Growth algorithm is a two-step process. In the first step, it builds compact data structure called the FP-tree. It builds this FP Tree using 2 scans over the database. From the FPTree it generates frequent itemsets. FP Growth algorithm is implemented for this data set. The number of association rules generated by the FP Growth algorithm is measured by varying support value from 0.4 to 0.6 and confidence value 0.1 to 0.8.This algorithm is implemented using Rapid Miner tool.

The following Fig. 6 shows that FP Growth algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values. For the support value 0.4 and 0.5 it reveals similar number of association rules.

FIG. 6: NUMBER OF ASSOCIATION RULES GENERATED USING TDTR ALGORITHM BY VARYING SUPPORT AND CONFIDENCE

The above graph shows that TDTR algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values. This TDTR algorithm clearly exhibits the association rules for higher confidence value.

Transaction Reduction and Random Sampling (TRRS) Algorithm: Our second part of the research work implements Transaction Reduction and Random Sampling Algorithm (TRRS) which integrates transaction reduction and random sampling approach to find the frequent dominating amino acids and to generate association rules in Dengue Virus Type1 dataset This algorithm integrates our Two Dimensional Transaction Reduction (TDTR) Algorithm with the Random Sampling method. Our earlier approach integrates TDTR algorithm with systematic sampling with 777 amino acids sequence. By integrating TDTR algorithm with sampling the efficiency of the algorithm gets increased ²⁷. This TRRS algorithm integrates Two Dimensional Transaction Reduction (TDTR) Algorithm with the Random Sampling for GenBank: AB189120.1 which consists of 3392 amino acids.

TRRS Algorithm:

//Algorithm to find frequent itemset and to generate association rules:

for each t_i∈ D do

begin

count the number of items in count1[i]

if the count1[i] ≥ min_sup then put the transactions in to D₁

end

for each I_i∈ D₁ do

begin

count the number of transactions in count2[i]

If the count2 [i] <min_sup then remove that I_ifrom D₁

end

begin

select optimal sample size s (Random sampling) from D_1.

find the frequent item sets from D₁ using FP-GROWTH algorithm with min_sup

generate association rules with min_conf with the optimal sample size

end

Algorithm: Transaction Reduction and Random Sampling: This TRRS algorithm first implements TDTR algorithm to reduce the size of the Database for further analysis. Then it selects the optimal sample size using Random sampling method. Optimal sample is selected based on the number of association rules revealed. Based on the optimal size, the frequent item sets are generated from the reduced database using FP-GROWTH algorithm with min_sup and association rules are found based on min_conf.

Generation of Association Rules: TRRS Algorithm (With Replacement)

TABLE 6: NUMBER OF ASSOCIATION RULES GENERATED USING TRRS ALGORITHM BY VARYING CONFIDENCE AND SAMPLE SIZE FOR SUPPORT = 0.6(WITH REPLACEMENT)

		Sample size
Confidence	30%	40%	50%	60%	70%	80%	90%
0.1	332	516	506	404	400	350	332
0.2	332	516	506	404	400	350	332
0.3	321	485	477	392	386	342	322
0.4	264	407	397	333	324	289	265
0.5	212	317	297	248	247	216	201
0.6	149	212	205	168	163	131	123
0.7	76	134	130	110	117	86	80
0.8	41	66	48	48	44	26	19

The above graph shows that TRRS algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values. The association rules are measured by varying sample size (With Replacement). From the above graph we can understand that sample size 40% reveals large number of association rules when compared with other sampling size. Our research work identifies that instead of considering the entire database it is enough to consider 40% of the transactions from the original database.

FIG. 7: NUMBER OF ASSOCIATION RULES GENERATED USING TRRS ALGORITHM BY VARYING CONFIDENCE AND SAMPLE SIZE FOR SUPPORT = 0.6 (WITH REPLACEMENT)

Generation of Association Rules: TRRS Algorithm (Without Replacement)

TABLE 7: NUMBER OF ASSOCIATION RULES GENERATED USING TRRS ALGORITHM BY VARYING CONFIDENCE AND SAMPLE SIZE FOR SUPPORT = 0.6 (WITHOUT REPLACEMENT)

	Sample size
Confidence	30%	40%	50%	60%	70%	80%	90%
0.1	402	628	402	472	402	332	336
0.2	402	628	402	472	402	332	336
0.3	390	583	394	447	385	323	320
0.4	325	504	324	374	317	268	264
0.5	245	367	247	280	227	199	200
0.6	177	223	162	181	150	122	129
0.7	103	150	104	112	102	81	84
0.8	56	88	51	48	40	12	20

FIG. 7

FIG. 8: NUMBER OF ASSOCIATION RULES GENERATED USING TRRS ALGORITHM BY VARYING CONFIDENCE AND SAMPLE SIZE FOR SUPPORT = 0.6 (WITHOUT REPLACEMENT)

The above graph shows that sampling with replacement and without replacement doesn’t produce much variations in finding association rules in association rule mining.

Performance Comparison:

Comparison of Algorithms: TRRS with Apriori

FIG. 9: COMPARISON OF NUMBER OF ASSOCIATION RULES GENERATED BY PROPOSED TRRS ALGORITHM WITH APRIORI FOR SUPPORT = 0.4

Comparison of Algorithms: TRRS with FP Growth

FIG. 10: COMPARISON OF NUMBER OF ASSOCIATION RULES GENERATED BY PROPOSED TRRS ALGORITHM WITH FP GROWTH FOR SUPPORT=0.4

TRRS Algorithm (Random Sampling Vs Stratified Sampling):

FIG. 11: COMPARISON OF NUMBER OF ASSOCIATION RULES GENERATED BY PROPOSED TRRS ALGORITHM (RANDOM SAMPLING VS STRATIFIED SAMPLING)

Comparison of TRRS Algorithm with SETM, ISETM and ITDTR:

FIG. 12: COMPARISON OF NUMBER OF ASSOCIATION RULES GENERATED BY PROPOSED TRRS ALGORITHM WITH SETM, ISETM AND ITDTR FOR SUPPORT = 0.4

Research Findings:

Transaction Reduction approach can also be combined with other sampling methods.

Random sampling method is an efficient sampling method to be combined with transaction reduction.

TRRS algorithm works efficiently than Apriori algorithm.

It reveals stable and similar behaviour with FP Growth algorithm. It produces more number of association rules than FP Growth algorithm.

TRRS algorithm produce optimal association rules. It produces more association rules for decreasing confidence values and less association rules for increasing support values.

Sample Screen shots:

FIG. 13: ASSOCIATION RULES IN TABLE VIEW FOR SUPPORT=0.6, SAMPLING=40%, CONFIDENCE=0.8

FIG. 14: ASSOCIATION RULES IN TEXT VIEW FOR SUPPORT=0.6, SAMPLING=40%, CONFIDENCE=0.8

FIG. 14

FIG. 15: ASSOCIATION RULES IN GRAPH VIEW FOR SUPPORT=0.6, SAMPLING= 40%, CONFIDENCE = 0.8

Research Findings: From the above figure we can identify the Largest Frequent Itemsets are {I,D,F,V},{I,D,F,E},{V,D,N,E},{V,D,F,I},{V,T,D} for Support = 0.6, Sampling = 40%, Confidence = 0.8. The Dominating Amino acids are Isolecuine, Aspartic Acid, Phenylalanine, Valine, Glutamic acid, Asparagine, Threonine which give knowledge to the pharmacists in drug discovery.

CONCLUSIONS: In this research work, we have implemented TDTR (Two Dimensional Transaction Reduction) Algorithm. It is integrated with Random Sampling methods which is our proposed Transaction Reduction and Random Sampling method to find frequent itemsets and hence to generate association rules. This algorithm proves its efficiency and accuracy.

CONTRIBUTIONS:

T20 amino acid sequence is suitable for each transaction.

TDTR algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values.

TRRS algorithm generates more association rules for decreasing confidence values and less association rules for increasing support values.

Sampling size 40% is suitable for generating frequent itemsets in association rule mining.

Sampling with replacement and without replacement doesn’t produce much variations in association rule mining.
Transaction Reduction can also be combined with other sampling methods.

Random sampling method is efficient to be combined with transaction reduction.

Generated frequent itemsets and association rules give knowledge to the pharmacists in drug discovery

TRRS method exhibits similar behaviour with FPGrowth algorithm which is superior than Apriori algorithm

FUTURE WORK: In future, the transaction reduction approach can be combined with Partitioning or Distributed methods to deal with large datasets. This research work concentrates on frequent items sets. Finding infrequent itemsets and negative association rules is an open topic for the future research work.

BENEFITS OF THIS RESEARCH WORK: This research work will find the hidden patterns in Polyprotein Dengue Virus Type1 DNA sequence and find the dominating amino acids using data mining techniques and to improve the quality of finding drugs for the pharmacists. This system will increase the demands in all pharmaceutical companies through innovation and to treat the patients carefully. The association between dominating amino acids will be useful to the drug designers to develop the antibiotics for the virulent diseases caused by viruses such as Ebola, Dengue, Zika and Anticancer drugs. Our research work efficiently finds the dominating amino acids in Dengue Virus Type1 Dataset using an integration of transaction reduction and random sampling approach.

REFERENCES:

Agrawal, R., T. Imielinksi and A. Swami, “Database mining: A performance perspective”, IEEE Transactions on Knowledge and Data Engineering, 1993; 5(6): 914-925.
Agrawal, R. and R. Srikant,” Fast algorithms for mining association rules”, Proc. 20 Int. Conf. Very Large Data Bases, VLDB, edited by J.B. Bocca, M. Jarke and C. Zaniolo, Morgan Kaufmann, 1994; 12: 487-499.
Agrawal, R., T. Imielinski and A. Swami, “Mining Association rules between sets of items in large databases”, Proc. of the ACM SIGMOD Int. Conf. on Management of Data ACM SIGMOD ‘93,Washington, USA, 1993; pp: 207-216.
Arun K Pujari,” Data Mining Techniques”, 5^th, Universities Press (India) Private Limited, 2003.
Anandhavalli M., Suraj Kumar Sudhanshu Ayush Kumar and M.K. Ghose,” Optimized association rule mining using genetic algorithm”, Advances in Information Mining, 2009; 1(2): 01-04.
Artamonova II, Frishman G, Gelfand MS, et al., “Mining sequence annotation databanks for association patterns”, Bioinformatics, 2005; 21:iii49–57.
Becquet C, Blachon S, Jeudy B, et al., “Strong-association- rule mining for large-scale gene-expression data analysis: a case study on human SAGE data”. Genome Biology, 2002; 3.
Bayardo RJ,” Efficiently mining long patterns from data- bases”, Proceedings of the ACM SIGMOD International Conference on Management of Data Seattle WA USA, ACM, 1998; 85–93.
Chen, M.S., J. Han and P.S. Yu,” Data Mining: An Overview from a Database Perspective”, IEEE Trans. Knowledge and Data Engg, 1996; 866-883.
Carmona-Saez P, Chagoyen M, Rodriguez A, et al., ”Integrated analysis of gene expression by association rules discovery”, BMC Bioinformatics, 2006; 7:54.
Chang J W. Lee,” A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams”, Int. journal of Information Science and Engg., 2004; 20: 753-762.
Creighton C, Hanash S, “Mining gene expression databases for association rules”, Bioinformatics, 2003; 19: 79–86.
Cai R, Hao Z, Wen W, et al.,” Kernel based gene expression pattern discovery and its application on cancer classification”, Neurocomputing, 2010; 73: 2562–70.
Franceschini A, Szklarczyk D, Frankild S, et al., STRING v 9.1: “protein-protein interaction networks with increased coverage and integration” Nucleic Acids Res 2012; 41: D808–15.
C,J. Han, J. Pei, X. Yan P.S. Yu, ”Mining Frequent patterns in data streams at multiple time granularities in Data Mining Next Generation Challenges”, 2003.
Ghosh, A. and B. Nath, “Multi-objective rule mining using genetic algorithms”, Information Sciences, 2004; 163: 123-133.
Gouda K, Zaki MJ, “Gen Max: an efficient algorithm for mining maximal frequent itemsets”, Data Min Knowl Discov, 2005; 11: 223–42.
Han J, Pei J, Yin Y, et al., “Mining frequent patterns without candidate generation: a frequent-pattern tree approach”, Data Min Knowl Discovery” 2004; 8:53–87.
Han,. Pei J and Y. Yin,” Mining Frequent Patterns without candidate generation “, Proceedings of the Conference on the Management of Data SIGMOD’00 2000.
He J, Hu H, Chen B, et al.,” Rule extraction from SVM for protein structure prediction Rule”, 2008; 80: 227–52.
Jiawei Han, Hong Cheng, Dong Xin and Xifeng Yan, “Frequent pattern mining current status and future directions”, Data Mining Knowledge Discovery, 2007, 15:55-86.
Jin, C, W. Qian, C. Sha, J.X. Yu, A. Zhou ,” Dynamically Maintaining Frequent Items Over a Data Stream”, In CIKM the International Conference on Information and Knowledge Management, 2003.
Koyuturk M, Kim Y, Subramaniam S, et al.,” Detecting conserved interaction patterns in biological networks”, Journal of Computational Biology, 2006; 13:1299-322.

Karpinets TV, Park BH, Uberbacher EC,” Analyzing large biological datasets with association networks”, Nucleic Acids Res, 2012; 40:e131.
Karabatak M, Ince MC,” An expert system for detection of breast cancer based on association rules and neural network”, Expert Syst Appl 2009; 36: 3465–9.
Kerana Hanirex. D, K.P. Kaliyamurthie,” Finding the Dominating Amino Acids in Dengue Virus (Type-1) Study on mining frequent itemsets”, Int. Journal of Pharama and Bio Sciences, 2013; 4(3): (B), 880 – 889.
Kerana Hanirex. D, K.P. Kaliyamurthie,” An Adaptive Transaction Reduction Approach for Mining Frequent Itemsets: A Comparative Study on Dengue Virus Type1”, Int. Journal of Pharma and Bio Sciences, 2015; 6(2): (B) 336-340.
Kerana Hanirex. D,” An Efficient TDTR Algorithm for Mining Frequent Itemsets”, International Journal of Electronics and Computer Science Engineering, 2012; V2(N1): 251-256.
Kerana Hanirex. D, K.P. Kaliyamurthie, A. Kumaravel,” Analysis of Improved TDTR Algorithm for mining frequent itemsets using Dengue virus type1 Dataset: A combined approach”, Int. Journal of Pharma and Bio Sciences, 2015; 6(2): (B) 228-295.
Kerana Hanirex. D, M.A. Dorai Rengaswamy,” Efficient Algorithm for Mining Frequent Itemsets using Clustering techniques”; IJCSE, 2011; 3(3):1028- 1032.
Leung K-S, Wong K-C, Chan T-M, et al., ”Discovering protein DNA binding sequence patterns using association rule mining”, Nucleic Acids Res, 2010, 38: 6324–37.
Manda P, Ozkan S, Wang H, et al., “Cross-ontology multi- level association rule mining in the gene ontology”; PLoS One, 2012; 7:
Ma L, Assimes T, Asadi N, et al.,” An almost exhaustive search-based sequential permutation method for detecting epistasis in disease association studies”, Genet Epidemiol, 2010; 34:434–43.
Pan F, Tung A, Cong G, et al., “COBBLER: combining column and row enumeration for closed pattern discovery”, Proceedings of the 16^th International Conference on Scientific and Statistical Database Management SSDBM, Santorini Island, Greece, Washington, DC: IEEE Computer Society 2004; 21–30.
Shiu SY, Jiang WR, Porterfield JS, Gould EA,” Envelopeprotein sequences of dengue virus isolates TH-36 and TH-Sman and identification of a type specific genetic marker for dengue and tick borne flaviviruses”, Journal of General Virology, 73; 207-212.
Soumadip Ghosh, Sushanta Biswas, Debasree Sarkar, Partha Pratim Sarkar “Mining Frequent Itemsets Using Genetic Algorithm”, International Journal of Artificial Intelligence & Applications (IJAIA), 2010; 1(4): 133-143.
Singh Vijendra, Laxman Sahoo Kelkar Ashwini, “An Effective Clustering Algorithm for Data Mining”, IEEE conference on Data Storage and Data Engineering (DSDE), 2010; 978-1-4244-5678-9.
Tan P-N, Kumar V, Srivastava J, “Selecting the right inter- estingness measure for association patterns”, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Edmonton Alberta Canada. New York ACM, 2002; 32–41.
Tweedie-Cullen RY, Brunner AM, Grossmann J, et al. “Identification of combinatorial patterns of post-translational modifications on individual histones in the mouse brain”, 2012; PLoSOne; 7: e 36980.
Tuzhilin A, “Handling very large numbers of association rules in the analysis of microarray data”, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Edmonton Alberta Canada, New York, ACM, 2002; 23–6.
Van Leemput K, Verschoren A,” Modeling networks as probabilistic sequences of frequent subgraphs”, http://win.ua.ac.be/adrem/bibrem/pubs/MLSB08.pdf .
World Health Organization Dengue fever and dengue hemorrhagic fever; Geneva, www.who.int/csr/disease/ dengue /2009
Zaki M, Parthasarathy S, Ogihara M, Li W, “New algorithms for fast discovery of association rules”, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD), Newport Beach CA USA Palo Alto CA: AAAI Press, 1997; 283–6.

How to cite this article:

Hanirex DK, Thooyamani KP and Khanaa V: Performance of association rules for Dengue Virus Type1 amino acids using an Integration of Transaction Reduction and Random Sampling (TRRS) algorithm. Int J Pharm Sci Res 2017; 8(6): 2578-87.doi: 10.13040/IJPSR.0975-8232.8(6).2578-87.

All © 2013 are reserved by International Journal of Pharmaceutical Sciences and Research. This Journal licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Article Information

Sr No: 30

Page No: 2578-2587

Size: 1004

Download: 1523

Cited By: 1

Language: English

Licence: IJPSR

Authors: D. Kerana Hanirex *, K. P. Thooyamani and V. Khanaa

Authors Address: Bharath University,Chennai, Tamilnadu, India.

Email: keranahanirex.cse@bharathuniv.ac.in

Received: 02 December, 2016

Revised: 16 January, 2017

Accepted: 17 February, 2017

DOI: 10.13040/IJPSR.0975-8232.8(6).2578-87

Published: 01 June, 2017

Download

ISSN (Online): 0975-8232,
ISSN (Print): 2320-5148

International Journal Of
Pharmaceutical Sciences And Research

An International Journal published monthly
An Official Publication of Society of Pharmaceutical Sciences and Research

PERFORMANCE OF ASSOCIATION RULES FOR DENGUE VIRUS TYPE1 AMINO ACIDS USING AN INTEGRATION OF TRANSACTION REDUCTION AND RANDOM SAMPLING (TRRS) ALGORITHM

Article Information

International Journal Of Pharmaceutical Sciences And Research

An International Journal published monthly An Official Publication of Society of Pharmaceutical Sciences and Research

PERFORMANCE OF ASSOCIATION RULES FOR DENGUE VIRUS TYPE1 AMINO ACIDS USING AN INTEGRATION OF TRANSACTION REDUCTION AND RANDOM SAMPLING (TRRS) ALGORITHM

Article Information

International Journal Of
Pharmaceutical Sciences And Research

An International Journal published monthly
An Official Publication of Society of Pharmaceutical Sciences and Research