Journal of Agricultural Big Data ›› 2024, Vol. 6 ›› Issue (2): 170-184.doi: 10.19788/j.issn.2096-6369.200001
Previous Articles Next Articles
LONG Chun1,2,*(
), QIN ZeXiu1,2, LI LiSha1,2, LI Jing1, YANG Fan1, WEI JinXia1,2, FU YuHao1
Received:2024-01-30
Accepted:2024-06-03
Online:2024-06-26
Published:2024-07-03
Contact:
LONG Chun
LONG Chun, QIN ZeXiu, LI LiSha, LI Jing, YANG Fan, WEI JinXia, FU YuHao. Survey of Differential Privacy Algorithms and Applications for High- Dimensional Data Publishing[J].Journal of Agricultural Big Data, 2024, 6(2): 170-184.
Table 1
Significant data breaches in the first quarter of 2023"
| 时间 | 事件 | 内容 | 受影响人数 |
|---|---|---|---|
| 2023年2月 | People Connect数据泄露 | 姓名、电子邮件、电话、密码等。 | 2020万人 |
| 2023年3月 | MCNA保险公司数据泄露 | 全名、出生日期、地址、电话、社会保障号码、驾驶执照号码等。 | 892万人 |
| 2023年3月 | Pharma公司数据泄露 | 姓名、地址、出生日期、社会保障号码、药物和健康保险信息。 | 580万人 |
| 2023年2月 | TMX Finance公司数据泄露 | 姓名、出生日期、社会保障号码、护照号码、驾驶执照号码和税号。 | 480万人 |
Table 3
Differential privacy release techniques for high-dimensional data"
| 方法名称 | 主要思路 | 通信成本 | 计算复杂度 | 数据效用 |
|---|---|---|---|---|
| PrivBayes[ | 构建贝叶斯网络,计算出数据集中各个属性的边缘概率分布,并注入噪声,利用边缘概率集合和贝叶斯网络,构建对数据集分布的近似表示。 | 低 | 低 | 低 |
| SS-PrivBayes[ | 思路同PrivBayes类似,但在注入噪声时,使用更精确的平滑敏感度。 | 低 | 高 | 高 |
| DP-SUBN[ | 通过顺序协作确定贝叶斯网络、构建搜索边界、量化相关性和确定最优参数,以生成合成数据集。 | 高 | 高 | 高 |
| MEPrivBayes[ | 通过构建k度贝叶斯网络,选择最大AMIC对应的节点作为第一个节点,并在d-k联合概率分布中加入噪声,从而生成合成数据集。 | 高 | 低 | 高 |
| PrivMN[ | 建立马尔可夫模型表示属性间关系,使用聚类图置信度传播进行近似推断生成低维边际表,再添加噪声得到噪声边际表,最终将其与马尔可夫模型结合发布合成数据集。 | 高 | 高 | 高 |
| LoPub[ | 本地扰动数据发送至服务器聚合,服务器估计数据联合概率分布并降维处理,最后根据估计分布对降维后的数据进行采样生成合成数据。 | 高 | 高 | 低 |
| PrivIncr[ | 利用基于增量学习的概率图构建方法,逐步修剪弱相关性的边缘,并分配更多的数据和隐私预算给有用的边缘,以提高模型的准确性。 | 低 | 高 | 高 |
| PrivHDP[ | 利用本地差分隐私下的扰动和联合分布估计,结合相关性度量和阈值过滤技术,生成近似原始数据的合成数据集。 | 高 | 高 | 高 |
| [1] | Zeng D D, Liu Y, Yan P, et al. Location-aware real-time recommender systems for Brick-and-Mortar Retailers[J]. INFORMS Journal on Computing, 2021, 33:1608-1623. https://doi.org/10.1287/ijoc.2020.1020. |
| [2] | The EU General Data Protection Regulation (GDPR). [EB/OL].. https://eur-lex.europa.eu/eli/reg/2016/679/oj. |
| [3] | The California Consumer Privacy Act (CCPA)[EB/OL]. https://cdp.cooley.com/ccpa-2018/. |
| [4] | Data Security Law of the People's Republic of China[EB/OL]. [2021-06-13]. https://www.gov.cn/xinwen/2021-06/11/content_5616919.htm. |
| [5] | Dwork C, McSherry F, Nissim K, et al. Calibrating Noise to Sensitivity in Private Data Analysis[C]. Theory of Cryptography Conference, Lecture Notes in Computer Science, 2006. https://doi.org/10.1007/11681878_14. |
| [6] | Erlingsson Ú, Korolova A, Pihur V. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response[C]. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014. https://doi.org/10.1145/2660267.2660348. |
| [7] | Ding X, Wang C, Choo K R, et al. A novel privacy preserving framework for large scale graph data publishing[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 33: 331-343. https://doi.org/10.1109/TKDE.2019.2931903. |
| [8] | Draft NIST Privacy Framework: A Tool for Improving Privacy through Enterprise Risk Management. [EB/OL]. https://www.nist.gov/system/files/documents/2020/01/16/NIST%20Privacy%20Framework_V1.0.pdf. |
| [9] | Li X Y, Sun Z L, Deng J B, et al. A comprehensive review of privacy protection technologies[J]. Computer Science, 2013, 40(S2): 199-202. |
| [10] | Li Y, Wen W, Xie G Q. A review of differential privacy protection research[J]. Journal of Computer Applications Research, 2012, 29(9): 3201-3205+3211. |
| [11] | Zhao Y Q, Yang M. A review of research progress on differential privacy[J]. Journal of Computer Science, 2023, 50(4): 265-276. |
| [12] | Gao Z Q, Wang Y T. Research progress on differential privacy techniques[J]. Journal of Communications, 2017, 38(S1): 151-155. |
| [13] | Ye Q Q, Meng X F, Zhu M J, et al. A review of local differential privacy research[J]. Journal of Software, 2018, 29(7): 1981-2005. |
| [14] | Liu J X, Meng X F. A review of privacy protection in machine learning[J]. Journal of Computer Research and Development, 2020, 57(2): 346-362. |
| [15] | Ouadrhiri A E, Abdelhadi A M. Differential privacy for deep and federated learning: a survey[J]. IEEE Access, 2022, 10: 22359-22380. https://ieeexplore.ieee.org/document/9714350. |
| [16] | Kong Y T, Tan F X, Zhao X, et al. A review of research on optimization of k-means algorithm based on differential privacy[J]. Journal of Computer Science, 2022, 49(2): 162-173. |
| [17] | Wang T, Huo Z, Huang Y X, et al. A review of privacy protection technologies in federated learning[J]. Journal of Computer Applications, 2023, 43(2): 437-449. |
| [18] | Narayanan A, Shmatikov V. Robust De-anonymization of Large Sparse Datasets[C]. 2008 IEEE Symposium on Security and Privacy (sp 2008), 2008. https://ieeexplore.ieee.org/document/4531148. |
| [19] | Ouadrhiri A E, Abdelhadi A M. Differential privacy for deep and federated learning: a survey[EB/OL]. IEEE Access, 2022, 10: 22359-22380. https://ieeexplore.ieee.org/document/9714350. |
| [20] | Chu X J. Research on High-Dimensional Data Publishing Method Meeting Local Differential Privacy[D]. Guizhou University, 2022. DOI: 10.27047/d.cnki.ggudu.2022.002172. |
| [21] | Amaratunga D, Cabrera J, Shkedy Z. Exploration and Analysis of DNA Microarray and Other High-Dimensional Data (2nd edition)[EB/OL]. https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118364505. fmatter. |
| [22] | Sweeney L. k-Anonymity: A model for protecting privacy[J]. Int. J. Uncertain. Fuzziness Knowl. Based Syst., 2002, 10: 557-570. https://doi.org/10.1142/S0218488502001648. |
| [23] | Cai M G, Shen G H, Huang Z Q, et al. High-dimensional data publishing method under local differential privacy[J]. Journal of Computer Science, 2024, 51(2): 322-332. |
| [24] | Zhang X, Chen H. A review of high-dimensional data publishing with differential privacy[J]. Journal of Intelligent Systems, 2021, 16(6): 989-998. |
| [25] |
Warner S L. Randomized response: a survey technique for eliminating evasive answer bias[J]. Journal of the American Statistical Association, 1965, 60(309): 63-6.. Warner-Randomized-Response-2283137.pdf (ncsu.edu).
pmid: 12261830 |
| [26] | McSherry F, Talwar K. Mechanism Design via Differential Privacy[C]. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), 2007. https://doi.org/10.1109/FOCS.2007.41. |
| [27] | Kairouz P, Oh S, Viswanath P. Extremal Mechanisms for Local Differential Privacy[C]. J. Mach. Learn. Res., 2016. https://jmlr.org/papers/v17/15-135.html. |
| [28] | Kairouz P, Bonawitz K A, Ramage D. Discrete Distribution Estimation under Local Privacy[C]. International Conference on Machine Learning. 2016. https://proceedings.mlr.press/v48/kairouz16.html. |
| [29] | Ma X, Liu H, Guan S. Improving the Effect of Frequent Itemset Mining with Hadamard Response under Local Differential Privacy[C]. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021. https://doi.org/10.1109/TrustCom53373.2021.00072. |
| [30] | Kikuchi H. Castell: Scalable Joint Probability Estimation of Multi-dimensional Data Randomized with Local Differential Privacy[EB/OL]. https://arxiv.org/abs/2212.01627. |
| [31] | Murakami T, Kawamoto Y. Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation[C]. USENIX Security Symposium. 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/murakami. |
| [32] | Duchi J C, Wainwright M J, Jordan M I. Minimax optimal procedures for locally private estimation[J]. Journal of the American Statistical Association, 2016, 113: 182 - 201. https://arxiv.org/abs/1604.02390. |
| [33] | Wang N, Xiao X, Yang Y D, et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]. 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019. https://ieeexplore.ieee.org/document/8731512/. |
| [34] | Li W, Zhang X, Li X, et al. PPDP-PCAO: An efficient high-dimensional data releasing method with differential privacy protection[J]. IEEE Access, 2019, 7: 176429-176437. https://ieeexplore.ieee.org/document/8924645. |
| [35] | Chaudhuri K, Sarwate A D, Sinha K. A near-optimal algorithm for differentially-private principal components[J]. Journal of Machine Learnning Research, 2013, 14: 2905-2943. https://dl.acm.org/doi/10.5555/2567709.2567754. |
| [36] |
Jiang X, Ji Z, Wang S, et al. Differential-Private Data Publishing Through Component Analysis[J]. Transactions on data privacy, 2013, 6(1): 19-34. https://www.tdp.cat/issues11/abs.a109a12.php.
pmid: 24409205 |
| [37] | Yang J, Li Y. Differentially private feature selection[C]// 2014 International Joint Conference on Neural Networks (IJCNN), 2014. https://www.sciencedirect.com/science/article/pii/S1877050914010412?via%3Dihub. |
| [38] | Zhang J, Cormode, Procopiuc C M, et al. PrivBayes: private data release via bayesian networks[C]. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014. https://dl.acm.org/doi/10.1145/2588555.2588573. |
| [39] | Li M, Ma X. Bayesian Networks-Based Data Publishing Method Using Smooth Sensitivity[C]. 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/ BDCloud/SocialCom/SustainCom), 2018. https://ieeexplore.ieee.org/xdocument/8672292. |
| [40] | Cheng X, Tang P, Su S, et al. Multi-Party High-Dimensional Data Publishing Under Differential Privacy[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32: 1557-1571. https://ieeexplore.ieee.org/document/8673599/. |
| [41] | Lu X, Piao C, Han J. Differential Privacy High-dimensional Data Publishing Method Based on Bayesian Network[C]. 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), 2022. https://ieeexplore.ieee.org/document/9853392. |
| [42] | Wei F., Zhang W, Chen Y, et al. Differentially Private High- Dimensional Data Publication via Markov Network[C]. Security and Privacy in Communication Networks. 2018. https://link.springer.com/chapter/10.1007/978-3-030-01701-9_8. |
| [43] | Ren X, Yu C, Yu W, et al. LoPub: High-Dimensional Crowdsourced Data Publication With Local Differential Privacy[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(9): 2151-2166. https://ieeexplore.ieee.org/document/8306916. |
| [44] | Liu G, Tang P, Hu C, et al. Multi-Dimensional Data Publishing With Local Differential Privacy[C]. International Conference on Extending Database Technology. 2023. https://openproceedings.org/2023/conf/edbt/paper-210.pdf. |
| [45] | Ray P, Reddy S S, Banerjee T S. Various dimension reduction techniques for high dimensional data analysis: a review[J]. Artificial Intelligence Review, 2021, 54: 3473-3515. https://link.springer.com/article/10.1007/s10462-020-09928-0. |
| [46] | Chen Y. Research on health and medical data sharing and personal information protection[J]. Journal of Intelligence, 2023, 42(5): 192-199. |
| [47] | Bai W T, Chen L X. A health medical data protection scheme based on differential privacy[J]. Computer Applications and Software, 2022, 39 (8): 304-311. |
| [48] | Zhang S, Li X. Differential privacy medical data publishing method based on attribute correlation[J]. Scientific Reports, 2022, 12. https://www.nature.com/articles/s41598-022-19544-3.pdf. |
| [49] | Rong J. An electronic medical record data security risk monitoring system based on differential privacy protection[J]. Automation Technology and Applications, 2022, 41(12): 169-172. |
| [50] | Tan L. Theory and Application of Dimensionality Reduction for High- dimensional Data[D]. National University of Defense Technology, 2005. |
| [51] | Yuan K, Cheng Y. Data risks of financial technology and its prevention and control strategies[J]. Journal of Beijing University of Aeronautics and Astronautics (Social Sciences Edition), 2023, 36(2): 46-58. |
| [52] | Zhu Z W, Zhang X. A brief analysis of privacy computing applications in the financial field[J]. Research on Financial Development, 2023(3): 90-92. |
| [53] | Deng W, Chen X T, Zhang Q H, et al. Differential privacy protection algorithm based on Tree Models[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2020, 32(5): 848-856. |
| [54] | Byrd D, Polychroniadou A. Differentially private secure multi-party computation for federated learning in financial applications[C]. Proceedings of the First ACM International Conference on AI in Finance. 2020. https://dl.acm.org/doi/10.1145/3383455.3422562. |
| No related articles found! |
|
||

