A Technological Survey on Privacy Preserving Data Publishing

There is an enormous collection of records of a particular individual in a system. These data needs to be highly secured. Due to easily availa records, the records or information is on high risk. These records are being used for the business purpose and as well as for the decision-making in the respective domain. However, any data that is in the raw form comes in the category of sensitive record as it contains the complete information of each particular individual. In the present scenario while publishing, the data mainly depend on the rules, policies and guidelines so that only the required information are published based on the agreements. Hence, privacy preserving and data publishing can be defined as tool and methods for publishing information while preserving privacy of the records. In this paper, there is a survey of various techniques and algorithm designed so far in order to preserve privacy of the data.

There is an enormous collection of records of a particular individual in a system. These data needs to be highly secured. Due to easily availability of these records, the records or information is on high risk. These records are being used for the business purpose making in the respective domain. However, any data that is in the ensitive record as it contains the complete information of each particular individual. In the present scenario while publishing, the data mainly depend on the rules, policies and guidelines so that only the required information are greements. Hence, privacy preserving and data publishing can be defined as tool and methods for publishing information while preserving privacy of the records. In this paper, there is a survey of various techniques and algorithm o preserve privacy of the nonymity, privacy preserving data diversity, slicing, anatomizations The original data of the individual is very sensitive so, there is risk of information breach. Original data of be his name, age, gender, employee code, company name, address, contact number, salary, etc. Therefore, the main challenge for the privacy of the data is to design methods and tools for publishing data in a risk environment.
In privacy preserving data publishing, there are two stages to successfully complete the process of publishing the record, these include data collection and data publisher. First task is to collect all data from the required domain of the owner and second task is to publish those collected data in the public with the security of the data such that the individual identity cannot be identified. Further, data publisher is the one who is having the original set of records, after collection of records the anonymiztion technique is applied in this phase only. After that is being, carry forward to the data recipient who publishes the data in the public. There are various anonymization techniques which are applied on the original table to make it more secure, these include generalization suppression, etc.
Anonymization is defined as a process that removes or replaces the identity of the individual from any record. The original table or the original set of records must satisfy the any of the anonymization technique. In generalization the values or record are being generalized. For e.g. age 10-20 are being grouped into one, age 20-30 are being grouped into one and so on. While in the case of suppression, either the values or records are being deleted or it is being replaced with less distinct values in order to maintain uniformity. ta publishing, there are two stages to successfully complete the process of publishing the record, these include data collection and data publisher. First task is to collect all data from the required domain of the owner and second task is collected data in the public with the security of the data such that the individual identity cannot be identified. Further, data publisher is the one who is having the original set of records, after collection of records the anonymiztion technique is lied in this phase only. After that is being, carry forward to the data recipient who publishes the data in the public. There are various anonymization techniques which are applied on the original table to make it more secure, these include generalization, Anonymization is defined as a process that removes or replaces the identity of the individual from any record. The original table or the original set of records must satisfy the any of the anonymization technique.
values or record are being 20 are being grouped into 30 are being grouped into one and so on. While in the case of suppression, either the values or records are being deleted or it is being replaced with nct values in order to maintain uniformity.

II. RELATED WORK
Privacy is considered one of the essential factors for publishing the data for preserving the data in the effective manner. In order to preserve the data there are various techniques which are being introduced in Privacy-preserving data publishing (ppdp). The first technique is k-anonymity, which states that if there is any given person specific detail it must produce a released data that guarantees that the individuals who are suspected for the data cannot be re-identified and data remain useful [1]. Any published data is said to have adhering the property of k-anonymity if the information for each person contained in the released that cannot be identified from at least k-1 individuals present [2]. This technique has some of the vulnerabilities due to which a new technique is being proposed known as l-diversity. A q* block is said to ldiverse if it contains at least l "well represented" values for the sensitive attributes(S) [5]. Any table is said to be l-diverse if any only if every q* block is diverse. It is easy to achieve but practical it is very difficult to achieve. The key idea is to limit the disclosure risk, for this to achieve there is requirement to measure the disclosure risk of the anonymized table. The limitation of l-diversity is that it is limited upto some extent on its assumption of the adversial knowledge. T-closenesss proposed a novel notation that formalizes the idea of background knowledge [7].
In order to calculate the distance between the values of the sensitive attributes Earth movers distance is considered best. Any equivalence class is said to have t-closeness if the distance between the distribution of a sensitive attribute in the present class and the distribution of the attribute present is the complete table is not more than the threshold value. Slicing is one of the new techniques which is introduced that partitions data both horizontally and vertically [11].
In slicing, there is random grouping of data due to which we does not have a clear scenario. There is information loss further in this case. An easy way to comply with the conference paper formatting requirements is to use this document as a template and simply type your text into it. [13]To address the limitation of slicing, overlapping slicing is being introduced that handless the data attributes on the concept of fuzzy clustering. It ensures the utility of the published data by adding attribute in the column so that attributes, which are duplicate, are combined to get better correlation value. It fails to support high dimensional data log with the multiple sensitive data. [16]In order of overcome the drawback of overlapping slicing Anatomization technique is being introduced. In this technique, the limitation of the overlapping slicing is being solved and further the loss of information is low. It strictly follows the property of k-anonymity as well as l-diversity. The approach of anatomy states that if there are two tables with join attributes and it goes for publishing, then it will correspond to those two tables that come in the category of lossy. Different types of techniques , are being discussed and compared in the table.   Skewness attack and as well as attack of similarity.
It is not vulnerable to this type of attack.

Disclosure
Fails to support attribute disclosure but supports identity disclosure.
does not support attribute disclosure.
It does not prevent.
It prevents attribute disclosure risk.

Ease
It is simple and easy to understand and can be achieved.
It may be difficult to achieve.
It is somewhat easy to achieve in comparison to l-diversity.
It is easy to achieve.

Categorization
It is a privacy model.
It comes under privacypreserving and datapublishing categorization.
It comes under privacypreserving and data publishing categorization.
It comes under the category of privacypreserving and data publishing(ppdp) categorization.

Privacy against attacker
There is no guarantee of privacy against attacker using background knowledge.
It also does not guarantee.
It also does not guarantee.
It may and mot guarantee against the privacy against attacker.

Probabilistic Attack
In this probabilistic attack is not possible.
In this probabilistic attack is not possible.
Probabilistic attack is possible.
Probabilistic attack is not possible.

Record Linkage
In this record linkage is possible.
Record linkage is possible.

In this record linkage is possible
In this record linkage is not possible.

Attribute Linkage
In this attribute linkage is possible.
In this attribute linkage is possible.
In this attribute linkage In this attribute linkage is also not possible.

Monotonocity Property
It satisfies the property of monotonicity property.
It also satisfies the property of monotonicity property.
It may or may not satisfy the monotonicity property.
It satisfies the property of monotonicity.

Categorization
It does not have any categorization.
It has been categorized into 3 types.
It does not have any category-zation.
It has been categorized into two categories: slicing and overlapped slicing.

Record Linkage
In this record linkage is possible.
Record linkage is possible.
In this record linkage is possible.
In this record linkage is not possible.

Data Utility loss
Is medium Is medium In this data utility loss is high.
Is very low in the case of high dimensionality.

Membership Disclosure
Yes it is possible Yes it is possible No it is not possible No it is not possible. International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456 @ IJTSRD | Available Online @ www.ijtsrd.com

III. CONCLUSION
In this paper, there is survey of the different techniques which are being introduced till now in order to preserve privacy of the record. The last technique i.e. Anatomization is considered one of the best technique so far among all the techniques. Future work can be introduction to new technique where loss of information is less and is more highly secur new technique should strongly follow the property of k-anonymity , l-diversity and as well as t so that the co-relation among the attributes which are less can be grouped together and then anonymization technique can be applied.

IV.
In this paper, there is survey of the different techniques which are being introduced till now in r to preserve privacy of the record. The last technique i.e. Anatomization is considered one of the best technique so far among all the techniques. Future work can be introduction to new technique where loss of information is less and is more highly secured. A new technique should strongly follow the property of diversity and as well as t-closeness , relation among the attributes which are less can be grouped together and then anonymization anonymity: a model for protecting privacy", International Journal Uncertain anonymity privacy protection using generalization and suppression", International Journal Uncertain Fuzz, 10(6): 571-588,