A Review Paper on Leveraging Data Duplication to Improve the Performance of Storage System with CLD and EHD Image Matching in the Cloud

With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. In existing paper, propose POD, a performance deduplication scheme, to improve the performance primary storage systems in the Cloud by leveraging data deduplication on the I/O path to remove redundant write requests while also saving storage space. This research works aims to remove data duplication in the cloud. Improve the performance of storage system. We use concept of image processing to utilize the space. In this paper we discussed about the design and implementation of data duplication to improve the efficiency of storage in cloud. system, implements wireless data access to servers. An alternative method for us is remove the data duplication in storage system by using web based application in which we can use two matching technic CLD(color layout descriptor) and EHD(enhance histogram descriptor).User can browse image and upload the image on web page then we apply CLD & EHD technic and then see uploaded image is already store on cloud or not, if there is matching image like uploaded image then we extract referenced of already store image then send to the receiver and receiver can receive the image. If there is no matching image then upload new image to database. By extracting reference of already store image there is no need to upload again same image to database so, we can remove data duplication, improve the storage space

With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. In existing paper, propose POD, a performance-oriented deduplication scheme, to improve the performance of primary storage systems in the Cloud by leveraging data deduplication on the I/O path to remove redundant write requests while also saving storage space. This research works aims to remove data duplication in the cloud. Improve the performance of ge system. We use concept of image processing to utilize the space. In this paper we discussed about the design and implementation of data duplication to improve the efficiency of storage in cloud. This system, implements wireless data access to servers. An alternative method for us is remove the data duplication in storage system by using web based application in which we can use two matching technic CLD(color layout descriptor) and EHD(enhance histogram descriptor).User can browse image and age on web page then we apply CLD & EHD technic and then see uploaded image is already if there is matching image like uploaded image then we extract referenced of already store image then send to the receiver and receiver can e the image. If there is no matching image then By extracting reference of already store image there is no need to upload again same image to database so, we can remove data duplication, improve the storage space efficiency and utilize network bandwidth so, our system more effective than the data deduplication to improve the performance of primary storage system.

Introduction:
Data duplication often called intelligent compression or single instance storage. it is processes that eliminates redundant copies of data and reduce storage overhead.
Data deduplication technique insures that only one unique instance of data is retained such as 1) disk 2) flash or tape. Data has been demonstrated to be an effective technique in Cloud backup and archiving applications to reduce the backup window, improve the storage and network bandwidth utilization. Recent studies reveal that moderate to high data redundancy clearly exists in virtual machine (VM) enterprise and high performance computing (HPC) storage systems. CLD and EHD techniques, performance oriented deduplication scheme, to improve the storage systems in the Cloud by leveraging data deduplication requests while also saving storage space. In this paper we discussed about the design and implementation of data duplication to improve the efficiency of storage in cloud. DRGIT&R, Amravati, Maharashtra, India y and utilize network bandwidth so, our system more effective than the data deduplication to improve the performance of primary storage system. Java JDK 6.0, Eclipse, Apache tomcat Data duplication often called intelligent compression or single instance storage. it is processes that eliminates redundant copies of data and reduce Data deduplication technique insures that only one unique instance of data is retained on storage media, such as 1) disk 2) flash or tape. Data deduplication has been demonstrated to be an effective technique in Cloud backup and archiving applications to reduce the backup window, improve the storage-space efficiency lization. Recent studies reveal that moderate to high data redundancy clearly exists in virtual machine (VM) enterprise and highperformance computing (HPC) storage systems. CLD and EHD techniques, performance oriented deduplication scheme, to improve the performance of storage systems in the Cloud by leveraging data deduplication requests while also saving storage space. In this paper we discussed about the design and implementation of data duplication to improve the

PROPOSED OBJECTIVE
What is data deduplication in cloud computing?
Data duplication often called intelligent compression or single instance storage. It is process that eliminates redundant copies of data and reduces storage overhead. Data deduplication technique insure that only one unique instance of data is retained on storage media, such as 1)disk 2) flash or tape Redundant data block are replace with pointer to a unique data copy in that way data deduplication closely align with incremental backup which copies only the data that has changed since the previous backup. For example typical e-mail system might contain 100 instances of same 1 MB file attachment if the email platform is backed up or archived. All 100 instances are saved, requiring 100 MB of storage space. With data duplication only one instance of attached stored: each subsequent instance is referenced back to the one saved copy therefore in this example 100 MB storage demands drop to 1MB. Color layout descriptor:-Is designed to capture the spatial distribution of color in an image .the feature extraction process consist of two parts; 1. Grid based representative color selection. 2. Discrete cosine transform with contization. The functionality of CLD is basically the matching -Image to image matching -Video clip to video clip matching Remark that the CLD is one of the most precise and fast color descriptor The edge histogram descriptor (EHD) is one of the widely used methods for shape detection. It basically represents the relative frequency of occurrence of 5 types of edges in each local area called a sub-image or image block. The sub image is defined by partitioning the image space into 4x4Non-overlapping blocks as shown in figure 1. So, the partition of image definitely creates 16 equal-sized blocks regardless of the size of the original image. To define the characteristics of the image block, we then generate a histogram of edge distribution for each image block. The edges of the image block are categorized into 5 types: vertical, horizontal, 45-degree diagonal, 135-degree diagonal and non-directional edges, as shown in Figure 2.Thus, the histogram for each image block represents the relative distribution of the 5 types of edges in the corresponding sub-image.

Technical Specifications And Result Analysis:
The technologies which are used to implement the system are:  Java jdk. 6  It requires less storage as it is data duplication application.  Efficient and fast access.

Conclusion:
In this paper, we propose CLD and EHD techniques, a performance oriented deduplication scheme, to improve the performance of storage systems in the Cloud by leveraging data deduplication requests while also saving storage space. In this paper we discussed about the design and implementation of data duplication to improve the efficiency of storage in cloud. This system, implements wireless data access to servers. An alternative method for us is remove the data duplication in storage system by using web based application.