农业大数据学报 ›› 2024, Vol. 6 ›› Issue (2): 170-184.doi: 10.19788/j.issn.2096-6369.200001

• “面向高质量共享的科学数据安全”专刊(上) • 上一篇    下一篇

面向高维数据发布的差分隐私算法及应用综述

龙春1,2,*(), 秦泽秀1,2, 李丽莎1,2, 李婧1, 杨帆1, 魏金侠1,2, 付豫豪1   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100190
  • 收稿日期:2024-01-30 接受日期:2024-06-03 出版日期:2024-06-26 发布日期:2024-07-03
  • 通讯作者: *
  • 作者简介:龙春,E-mail:longchun@cnic.cn
  • 基金资助:
    国家重点研发计划:金融数据全周期流转安全风险评估监测与溯源技术研究(2023YFC3304704);中国科学院网络安全和信息化专项(CAS- WX2022GC-04);中国科学院青年创新促进会项目(2022170)

Survey of Differential Privacy Algorithms and Applications for High- Dimensional Data Publishing

LONG Chun1,2,*(), QIN ZeXiu1,2, LI LiSha1,2, LI Jing1, YANG Fan1, WEI JinXia1,2, FU YuHao1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-01-30 Accepted:2024-06-03 Published:2024-06-26 Online:2024-07-03

摘要:

随着大数据和机器学习技术的进一步发展,处理具有几十上百维特征的复杂结构和关系且蕴含丰富语义信息的高维数据成为一项挑战。在保障个人隐私不被泄露的前提下,如何安全地使用这些高维数据,成为当前的一个重要话题。我们查阅资料发现:关于差分隐私技术本身的综述很多,但是面向高维数据发布的差分隐私算法及应用的综述却很少。基于此,本文通过对差分隐私在高维数据领域的应用进行综述,深入了解不同方法在保护高维数据隐私方面的优劣,并指导面向高维数据发布的差分隐私算法未来研究的方向,从而更好地应对隐私保护和数据分析的挑战。本文首先介绍了差分隐私的原理和特性,总结了当前差分隐私技术本身的研究工作。然后从数据降维和数据合成两个角度分析了差分隐私在高维数据环境中的应用,探讨了差分隐私面临的问题和挑战,并提出了初步的解决方法,旨在更好地解决当前高维数据保护和使用的问题。最后,本文提出了未来可能的研究方向以促进技术交流,推动差分隐私在高维数据应用中的进一步突破。

关键词: 差分隐私, 高维数据, 扰动机制, 隐私分配

Abstract:

With the further development of big data and machine learning technologies, handling high-dimensional data with complex structures, relationships, and rich semantic information containing dozens to hundreds of features has become a challenge. Safely utilizing such high-dimensional data, while ensuring the privacy of individuals, has become a significant topic today. Upon reviewing existing literature, we found numerous reviews on differential privacy technology itself, but few on the algorithms and applications of differential privacy specifically tailored for high-dimensional data. Therefore, this paper provides a review of the application of differential privacy in the field of high-dimensional data, aiming to delve into the strengths and weaknesses of different methods in protecting the privacy of high-dimensional data and to guide future research directions for differential privacy algorithms tailored for high-dimensional data publishing. Firstly, this paper introduces the principles and characteristics of differential privacy, summarizing the current research work on the technology itself. Then, it analyzes the application of differential privacy in high-dimensional data environments from the perspectives of data dimensionality reduction and data synthesis, discussing the challenges and issues faced by differential privacy and proposing preliminary solutions to better address the issues of privacy protection and data analysis in the current high-dimensional data landscape. Lastly, potential future research directions are proposed to facilitate technological exchange and further advancements in the application of differential privacy in high-dimensional data settings.

Key words: differential privacy, high-dimensional data, perturbation mechanism, privacy allocation