The Importance Of Data Mining

1416 Words6 Pages
Data mining aims at discovering novel, attention-grabbing and helpful information from databases [1]. Conventionally, the information was analysed manually. Several hidden and probably helpful relationships might not be recognized by the analyst. Nowadays, several organizations together with trendy hospitals are capable of generating and assembling a large quantity of data [2]. The collection of digital information by governments, corporations, and individuals has created an environment that facilitates large-scale data mining and data analysis.This explosive growth of data needs an automatic way to extract helpful knowledge and there is a demand for sharing data among various parties. For example, licensed hospitals in California are required…show more content…
To extract or mine knowledge from these large amounts of data, data mining came forward. Data mining can be performed on all kinds of information repository; includes relational databases, data warehouses, transactional databases, advanced database systems, protein and gene sequences database, social networks, flat files and World Wide Web.Most enterprises are actively collecting and storing data in large databases and publishing it on the web. Many of them are using this data as an information source for making business decisions. Privacy preservation is of important concern when publishing the information that contains specific records of individual. In general information about individual’s records will violate the privacy. But by using some methods privacy can be achieved to great…show more content…
First, bucketization does not prevent membership disclosure. Because bucketization publishes the quasi identifier (QI) values in their original forms, an adversary can find out whether an individual has a record in the published data or not. As shown in [6], 87 percent of the individuals in the United States can be uniquely identified using only three attributes (Birth date, Sex, and Zip code). A micro data (e.g., census data) usually contains many other attributes besides those three attributes. This means that the membership information of most individuals can be inferred from the bucketized table. Second, bucketization requires a clear separation between QIs and Sensitive Attributes (SAs). However, in many data sets, it is unclear which attributes are QIs and which are SAs. Third, by separating the sensitive attribute from the QI attributes, bucketization breaks the attribute correlations between the QIs and the SAs. A new technique slicing [7] was developed to improve the current state of the art. Slicing partitions the data set both vertically and horizontally. Vertical partitioning is done by grouping attributes into columns based on the correlations among the attributes. Slicing breaks the association cross columns, but preserves the association within each column. This reduces the dimensionality of the data and preserves better utility than generalization and
Open Document