TY - GEN
T1 - Privacy preserving distributed learning clustering of healthcare data using cryptography protocols
AU - Elmesiry, Ahmed
AU - Fu, Huaiguo
PY - 2010/12/13
Y1 - 2010/12/13
N2 - Data mining is the process of knowledge discovery in databases (centralized or distributed); it consists of different tasks associated with them different algorithms. Nowadays the scenario of one centralized database that maintains all the data is difficult to achieve due to different reasons including physical, geographical restrictions and size of the data itself. One approach to solve this problem is distributed databases where different parities have horizontal or vertical partitions of the data. The data is normally maintained by more than one organization, each of which aims at keeping its information stored in the databases private, thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed data when privacy is highly concerned. Cluster analysis is a frequently used data mining task which aims at decomposing or partitioning a usually multivariate data set into groups such that the data objects in one group are the most similar to each other. It has an important role in different fields such as bio-informatics, marketing, machine learning, climate and healthcare. In this paper we introduce a novel clustering algorithm that was designed with the goal of enabling a privacy preserving version of it, along with sub-protocols for secure computations, to handle the clustering of vertically partitioned data among different healthcare data providers.
AB - Data mining is the process of knowledge discovery in databases (centralized or distributed); it consists of different tasks associated with them different algorithms. Nowadays the scenario of one centralized database that maintains all the data is difficult to achieve due to different reasons including physical, geographical restrictions and size of the data itself. One approach to solve this problem is distributed databases where different parities have horizontal or vertical partitions of the data. The data is normally maintained by more than one organization, each of which aims at keeping its information stored in the databases private, thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed data when privacy is highly concerned. Cluster analysis is a frequently used data mining task which aims at decomposing or partitioning a usually multivariate data set into groups such that the data objects in one group are the most similar to each other. It has an important role in different fields such as bio-informatics, marketing, machine learning, climate and healthcare. In this paper we introduce a novel clustering algorithm that was designed with the goal of enabling a privacy preserving version of it, along with sub-protocols for secure computations, to handle the clustering of vertically partitioned data among different healthcare data providers.
KW - Clustering
KW - Cryptography
KW - Privacy
U2 - 10.1109/COMPSACW.2010.33
DO - 10.1109/COMPSACW.2010.33
M3 - Conference contribution
AN - SCOPUS:78649884242
SN - 978-0769541051
SP - 140
EP - 145
BT - Proceedings - 34th Annual IEEE International Computer Software and Applications Conference Workshops, COMPSACW 2010
T2 - 34th Annual IEEE International Computer Software and Applications Conference Workshops, COMPSACW 2010
Y2 - 19 July 2010 through 23 July 2010
ER -