Survey of Differential Privacy Algorithms and Applications for High- Dimensional Data Publishing

Expand
  • 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China

Received date: 2024-01-30

  Accepted date: 2024-06-03

  Online published: 2024-07-03

Abstract

With the further development of big data and machine learning technologies, handling high-dimensional data with complex structures, relationships, and rich semantic information containing dozens to hundreds of features has become a challenge. Safely utilizing such high-dimensional data, while ensuring the privacy of individuals, has become a significant topic today. Upon reviewing existing literature, we found numerous reviews on differential privacy technology itself, but few on the algorithms and applications of differential privacy specifically tailored for high-dimensional data. Therefore, this paper provides a review of the application of differential privacy in the field of high-dimensional data, aiming to delve into the strengths and weaknesses of different methods in protecting the privacy of high-dimensional data and to guide future research directions for differential privacy algorithms tailored for high-dimensional data publishing. Firstly, this paper introduces the principles and characteristics of differential privacy, summarizing the current research work on the technology itself. Then, it analyzes the application of differential privacy in high-dimensional data environments from the perspectives of data dimensionality reduction and data synthesis, discussing the challenges and issues faced by differential privacy and proposing preliminary solutions to better address the issues of privacy protection and data analysis in the current high-dimensional data landscape. Lastly, potential future research directions are proposed to facilitate technological exchange and further advancements in the application of differential privacy in high-dimensional data settings.

Cite this article

LONG Chun, QIN ZeXiu, LI LiSha, LI Jing, YANG Fan, WEI JinXia, FU YuHao . Survey of Differential Privacy Algorithms and Applications for High- Dimensional Data Publishing[J]. Journal of Agricultural Big Data, 2024 , 6(2) : 170 -184 . DOI: 10.19788/j.issn.2096-6369.200001

References

[1] Zeng D D, Liu Y, Yan P, et al. Location-aware real-time recommender systems for Brick-and-Mortar Retailers[J]. INFORMS Journal on Computing, 2021, 33:1608-1623. https://doi.org/10.1287/ijoc.2020.1020.
[2] The EU General Data Protection Regulation (GDPR). [EB/OL].. https://eur-lex.europa.eu/eli/reg/2016/679/oj.
[3] The California Consumer Privacy Act (CCPA)[EB/OL]. https://cdp.cooley.com/ccpa-2018/.
[4] Data Security Law of the People's Republic of China[EB/OL]. [2021-06-13]. https://www.gov.cn/xinwen/2021-06/11/content_5616919.htm.
[5] Dwork C, McSherry F, Nissim K, et al. Calibrating Noise to Sensitivity in Private Data Analysis[C]. Theory of Cryptography Conference, Lecture Notes in Computer Science, 2006. https://doi.org/10.1007/11681878_14.
[6] Erlingsson ú, Korolova A, Pihur V. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response[C]. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014. https://doi.org/10.1145/2660267.2660348.
[7] Ding X, Wang C, Choo K R, et al. A novel privacy preserving framework for large scale graph data publishing[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 33: 331-343. https://doi.org/10.1109/TKDE.2019.2931903.
[8] Draft NIST Privacy Framework: A Tool for Improving Privacy through Enterprise Risk Management. [EB/OL]. https://www.nist.gov/system/files/documents/2020/01/16/NIST%20Privacy%20Framework_V1.0.pdf.
[9] Li X Y, Sun Z L, Deng J B, et al. A comprehensive review of privacy protection technologies[J]. Computer Science, 2013, 40(S2): 199-202.
[10] Li Y, Wen W, Xie G Q. A review of differential privacy protection research[J]. Journal of Computer Applications Research, 2012, 29(9): 3201-3205+3211.
[11] Zhao Y Q, Yang M. A review of research progress on differential privacy[J]. Journal of Computer Science, 2023, 50(4): 265-276.
[12] Gao Z Q, Wang Y T. Research progress on differential privacy techniques[J]. Journal of Communications, 2017, 38(S1): 151-155.
[13] Ye Q Q, Meng X F, Zhu M J, et al. A review of local differential privacy research[J]. Journal of Software, 2018, 29(7): 1981-2005.
[14] Liu J X, Meng X F. A review of privacy protection in machine learning[J]. Journal of Computer Research and Development, 2020, 57(2): 346-362.
[15] Ouadrhiri A E, Abdelhadi A M. Differential privacy for deep and federated learning: a survey[J]. IEEE Access, 2022, 10: 22359-22380. https://ieeexplore.ieee.org/document/9714350.
[16] Kong Y T, Tan F X, Zhao X, et al. A review of research on optimization of k-means algorithm based on differential privacy[J]. Journal of Computer Science, 2022, 49(2): 162-173.
[17] Wang T, Huo Z, Huang Y X, et al. A review of privacy protection technologies in federated learning[J]. Journal of Computer Applications, 2023, 43(2): 437-449.
[18] Narayanan A, Shmatikov V. Robust De-anonymization of Large Sparse Datasets[C]. 2008 IEEE Symposium on Security and Privacy (sp 2008), 2008. https://ieeexplore.ieee.org/document/4531148.
[19] Ouadrhiri A E, Abdelhadi A M. Differential privacy for deep and federated learning: a survey[EB/OL]. IEEE Access, 2022, 10: 22359-22380. https://ieeexplore.ieee.org/document/9714350.
[20] Chu X J. Research on High-Dimensional Data Publishing Method Meeting Local Differential Privacy[D]. Guizhou University, 2022. DOI: 10.27047/d.cnki.ggudu.2022.002172.
[21] Amaratunga D, Cabrera J, Shkedy Z. Exploration and Analysis of DNA Microarray and Other High-Dimensional Data (2nd edition)[EB/OL]. https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118364505. fmatter.
[22] Sweeney L. k-Anonymity: A model for protecting privacy[J]. Int. J. Uncertain. Fuzziness Knowl. Based Syst., 2002, 10: 557-570. https://doi.org/10.1142/S0218488502001648.
[23] Cai M G, Shen G H, Huang Z Q, et al. High-dimensional data publishing method under local differential privacy[J]. Journal of Computer Science, 2024, 51(2): 322-332.
[24] Zhang X, Chen H. A review of high-dimensional data publishing with differential privacy[J]. Journal of Intelligent Systems, 2021, 16(6): 989-998.
[25] Warner S L. Randomized response: a survey technique for eliminating evasive answer bias[J]. Journal of the American Statistical Association, 1965, 60(309): 63-6.. Warner-Randomized-Response-2283137.pdf (ncsu.edu).
[26] McSherry F, Talwar K. Mechanism Design via Differential Privacy[C]. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), 2007. https://doi.org/10.1109/FOCS.2007.41.
[27] Kairouz P, Oh S, Viswanath P. Extremal Mechanisms for Local Differential Privacy[C]. J. Mach. Learn. Res., 2016. https://jmlr.org/papers/v17/15-135.html.
[28] Kairouz P, Bonawitz K A, Ramage D. Discrete Distribution Estimation under Local Privacy[C]. International Conference on Machine Learning. 2016. https://proceedings.mlr.press/v48/kairouz16.html.
[29] Ma X, Liu H, Guan S. Improving the Effect of Frequent Itemset Mining with Hadamard Response under Local Differential Privacy[C]. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021. https://doi.org/10.1109/TrustCom53373.2021.00072.
[30] Kikuchi H. Castell: Scalable Joint Probability Estimation of Multi-dimensional Data Randomized with Local Differential Privacy[EB/OL]. https://arxiv.org/abs/2212.01627.
[31] Murakami T, Kawamoto Y. Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation[C]. USENIX Security Symposium. 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/murakami.
[32] Duchi J C, Wainwright M J, Jordan M I. Minimax optimal procedures for locally private estimation[J]. Journal of the American Statistical Association, 2016, 113: 182 - 201. https://arxiv.org/abs/1604.02390.
[33] Wang N, Xiao X, Yang Y D, et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]. 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019. https://ieeexplore.ieee.org/document/8731512/.
[34] Li W, Zhang X, Li X, et al. PPDP-PCAO: An efficient high-dimensional data releasing method with differential privacy protection[J]. IEEE Access, 2019, 7: 176429-176437. https://ieeexplore.ieee.org/document/8924645.
[35] Chaudhuri K, Sarwate A D, Sinha K. A near-optimal algorithm for differentially-private principal components[J]. Journal of Machine Learnning Research, 2013, 14: 2905-2943. https://dl.acm.org/doi/10.5555/2567709.2567754.
[36] Jiang X, Ji Z, Wang S, et al. Differential-Private Data Publishing Through Component Analysis[J]. Transactions on data privacy, 2013, 6(1): 19-34. https://www.tdp.cat/issues11/abs.a109a12.php.
[37] Yang J, Li Y. Differentially private feature selection[C]// 2014 International Joint Conference on Neural Networks (IJCNN), 2014. https://www.sciencedirect.com/science/article/pii/S1877050914010412?via%3Dihub.
[38] Zhang J, Cormode, Procopiuc C M, et al. PrivBayes: private data release via bayesian networks[C]. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014. https://dl.acm.org/doi/10.1145/2588555.2588573.
[39] Li M, Ma X. Bayesian Networks-Based Data Publishing Method Using Smooth Sensitivity[C]. 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/ BDCloud/SocialCom/SustainCom), 2018. https://ieeexplore.ieee.org/xdocument/8672292.
[40] Cheng X, Tang P, Su S, et al. Multi-Party High-Dimensional Data Publishing Under Differential Privacy[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32: 1557-1571. https://ieeexplore.ieee.org/document/8673599/.
[41] Lu X, Piao C, Han J. Differential Privacy High-dimensional Data Publishing Method Based on Bayesian Network[C]. 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), 2022. https://ieeexplore.ieee.org/document/9853392.
[42] Wei F., Zhang W, Chen Y, et al. Differentially Private High- Dimensional Data Publication via Markov Network[C]. Security and Privacy in Communication Networks. 2018. https://link.springer.com/chapter/10.1007/978-3-030-01701-9_8.
[43] Ren X, Yu C, Yu W, et al. LoPub: High-Dimensional Crowdsourced Data Publication With Local Differential Privacy[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(9): 2151-2166. https://ieeexplore.ieee.org/document/8306916.
[44] Liu G, Tang P, Hu C, et al. Multi-Dimensional Data Publishing With Local Differential Privacy[C]. International Conference on Extending Database Technology. 2023. https://openproceedings.org/2023/conf/edbt/paper-210.pdf.
[45] Ray P, Reddy S S, Banerjee T S. Various dimension reduction techniques for high dimensional data analysis: a review[J]. Artificial Intelligence Review, 2021, 54: 3473-3515. https://link.springer.com/article/10.1007/s10462-020-09928-0.
[46] Chen Y. Research on health and medical data sharing and personal information protection[J]. Journal of Intelligence, 2023, 42(5): 192-199.
[47] Bai W T, Chen L X. A health medical data protection scheme based on differential privacy[J]. Computer Applications and Software, 2022, 39 (8): 304-311.
[48] Zhang S, Li X. Differential privacy medical data publishing method based on attribute correlation[J]. Scientific Reports, 2022, 12. https://www.nature.com/articles/s41598-022-19544-3.pdf.
[49] Rong J. An electronic medical record data security risk monitoring system based on differential privacy protection[J]. Automation Technology and Applications, 2022, 41(12): 169-172.
[50] Tan L. Theory and Application of Dimensionality Reduction for High- dimensional Data[D]. National University of Defense Technology, 2005.
[51] Yuan K, Cheng Y. Data risks of financial technology and its prevention and control strategies[J]. Journal of Beijing University of Aeronautics and Astronautics (Social Sciences Edition), 2023, 36(2): 46-58.
[52] Zhu Z W, Zhang X. A brief analysis of privacy computing applications in the financial field[J]. Research on Financial Development, 2023(3): 90-92.
[53] Deng W, Chen X T, Zhang Q H, et al. Differential privacy protection algorithm based on Tree Models[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2020, 32(5): 848-856.
[54] Byrd D, Polychroniadou A. Differentially private secure multi-party computation for federated learning in financial applications[C]. Proceedings of the First ACM International Conference on AI in Finance. 2020. https://dl.acm.org/doi/10.1145/3383455.3422562.
Outlines

/