Association Rule Hiding using Hash Tree

As extensive chronicles of information contain classified rules that must be protected before distributed, association rule hiding winds up one of basic privacy preserving data mining issues. Information sharing between two associations is ordinary in various application zones for instance business planning or marketing. Profitable overall patterns can be found from the incorporated dataset. In any case, some delicate patterns that ought to have been kept private could likewise be uncovered. Vast disclosure of touchy patterns could diminish the forceful limit of the information owner. Database outsourcing is becoming a necessary business approach in the ongoing distributed and parallel frameworks for incessant things identification. This paper focuses on introducing a few adjustments to safeguard both customer and server privacy. Adjustment strategies like hash tree to existing APRIORI algorithm are recommended that will be helping in safeguarding the accuracy, utility loss and data privacy and result is generated in small execution time. We implement the modified algorithm to two custom datasets of different sizes.


INTRODUCTION
Data mining expels novel and profitable learning from broad files of information and has transformed into an effective examination and decision strategies in organization. The sharing of information for data mining can bring a lot of points of interest for research and business participation; in any case, tremendous storage facilities of information contain private information and touchy rules that must be verified before distributed. Awakened by the various clashing necessities of information sharing, insurance and learning discovery, privacy preserving data mining has transformed into an examination hotspot in data mining and database security fields.
Two issues are tended to in privacy preserving data mining, one is the security of private information; another is the confirmation of sensitive rules (learning) contained in the information. The past settles how to get normal mining results when private information can't be gotten to precisely; the last settles how to guarantee delicate rules contained in the information from being found, while nontouchy principles can at present be mined routinely. The last issue is called information hiding in database in which is inverse to learning discovery in database. Emphatically, the issue of learning discovery can be portrayed as seeks after.

RELATED WORK
Data mining is the one of the critical thinking method takes care of numerous business arranged issues, all things considered, among association rule mining is one of the vital viewpoints for learning discovery. R. AGARWAL spoke to interested association rules among the diverse datasets. Mining successive patterns is a principal part in mining distinctive thing sets in database applications, for example, consecutive patterns and mining association rules and so on. According to specialist Sergey Brian ETAL suggested a dynamic item set counting (DIC) using APRIORI calculation to assembled extensive thing set and makes its subset likewise vast so it will increase memory and time complexity. All calculations proposed before are retrieving regular thing sets continuously using association rule mining with APRIORI calculations. Each dimension all subsets of incessant example are additionally recovered every now and again. By these calculations substantial successive patterns with candidate keys are generated. By the prior frameworks we have to filter the database continuously, consequently proficiency of mining is additionally diminished. Because of these deterrents, an analyst JIAWEI HAN proposed a calculation without generating a candidate key, by scanning the database less times, we are going to create a FPdevelopment calculation to increase productivity contrasted with past calculations of association rule mining using APRIORI calculation. By avoiding the candidate age process and less ignores the database, FP-Tree establishes to be quicker than the APRIORI calculation. The disadvantages of using FP-mining are mining finished thing sets for which if there is an expansive incessant item sets with size X subset, nearly 2X subset of thing sets are generated consequently. Anyway to producing a huge number of contingent FP-trees in mining the proficiency of association rule mining using FP-development is having disadvantages. In this paper we propose a hash-tree based calculation.

PROBLEM DEFINITION
To design and implement hash tree APRIORI algorithms in order to reduce time and memory complexity of execution and solve the integrity and security issues in distributed data.

PROPOSED ALGORITHM Rule for an Efficiency Improvement
We can improve the efficiency of the APRIORI by:

This can be done by using hash trees.
This algorithm was implemented on a Python environment with Intel 2.9 GHz Intel Core i5 processor.
The performance of the rules generated is analyzed using support and confidence.
We need support because if we use confidence only some of the rules might produce by chance. So support helps us to find item set that people seldom buy together so that we can generate association rules out of them. Confidence provides reliability of the inference that can be derived by the rule. Higher the confidence, higher its likely it is for Y to be present in the transactions that contain X.

Total possible rules:
3^d -2^ (d + 1) + 1 X -> Y only depends upon the support of (xUy) If support of (x U y) is less than all the 2*(|x| + |y| -1) rules generated will waste computing power.
So problem is divided into two parts:

The APRIORI principle:
If an item set is frequent them all of its subsets must be frequent.
Conversely if item set is infrequent then all of its supersets are infrequent. Support based pruning: Trimming exponential search space based on support measure. Candidate generation and pruning: Candidates -> Ck is set of all possible candidates. Fk is set of frequent candidates: Here after APRIORI we use Hash Tree so that candidate item sets are partitioned into different buckets and stored in hash tree.
During support counting, item sets contained in each transaction are also hashed into appropriate buckets. That way instead of comparing each transaction with every candidate item set, it is matched only against candidate item set that belong to the same bucket.
This indeed helps in reducing time as well as provides security to the data

RESULTS AND DISCUSSION
For implementing the Modified APRIORI Algorithm, we used two custom datasets of different sizes.
The small dataset consisted of 1000*9 random integer dataset with missing values. After implementing the algorithm in Python and comparing the results with Original Unmodified APRIORI Algorithm we see that APRIORI Algorithm with hast tree works much faster for datasets than the original one.
Hence by using the modified APRIORI algorithm using hash tree we can improve not only the security of the data but also the overall efficiency.

CONCLUSION
We see that computational complexity depends upon: 1. Threshold Support: Size of C increases. 2. Number of items: Size of both C, F may increase, requires more space and IO cost will increase