农业大数据学报 ›› 2023, Vol. 5 ›› Issue (4): 24-36.doi: 10.19788/j.issn.2096-6369.230403
收稿日期:
2023-10-18
接受日期:
2023-12-08
出版日期:
2023-12-26
发布日期:
2024-01-05
通讯作者:
寇远涛,E-mail: kouyuantao@caas.cn。
作者简介:
张洁,E-mail:基金资助:
ZHANG Jie1,2(), ZHU Liang1,2, KOU YuanTao1,2,*()
Received:
2023-10-18
Accepted:
2023-12-08
Online:
2023-12-26
Published:
2024-01-05
摘要:
总结国内外个性化学术文本检索的研究现状,为后续个性化学术检索的研究提供思路借鉴和前景展望。搜集国内外相关文献共计154篇,采用文献分析法归纳出个性化学术文本检索研究框架,并对核心研究与辅助研究点进行详细论述。国内外个性化学术文本检索相关研究已逐渐系统化,从理论研究走向理论与实践研究并举,目前存在低负担高隐私的交互方式尚未实现,面向认知要素的深层次个性化检索尚未实现及适用情境识别的前置研究缺失等研究问题。积极拥抱大模型等新技术赋予的能力,走向认知化、嵌入情境式及实时交互式是个性化学术文本检索的未来发展方向。
张洁, 朱亮, 寇远涛. 国内外学术场景下个性化文本检索研究述评[J]. 农业大数据学报, 2023, 5(4): 24-36.
ZHANG Jie, ZHU Liang, KOU YuanTao. Research Review on Personalized Text Retrieval in the Academic Scene[J]. Journal of Agricultural Big Data, 2023, 5(4): 24-36.
表1
用户数据采集渠道汇总对比"
序号 | 采集渠道 | 类别 | 可采集内容 | 使用限制 | 适用优势 |
---|---|---|---|---|---|
1 | 网站订阅 | 显式 | 个人信息、研究方向、兴趣主体等 | 需要用户自愿填写,数据存在时效性 | 不需要额外的用户建模工作 |
2 | 调查问卷 | 个人信息、研究方向、兴趣主体、使用建议等 | 需要用户自愿填写及额外的数据整理,数据存在时效性 | 适用于特定实验情境下搜集用户信息 | |
3 | 浏览器缓存 | 隐式 | 网址访问记录等 | 访问数据局限在当前浏览器内,需要定期上传缓存数据 | 用户设备不需要额外安装任何工具 |
4 | 代理服务器 | 网址访问记录等 | 需要使用代理服务器访问信息系统 | 用户可使用任意浏览器访问信息系统 | |
5 | 桌面代理 | 所有操作记录 | 需要额外安装桌面代理工具 | 可记录用户在当前设备上的所有操作 | |
6 | 网站日志 | 所有操作及访问记录等 | 需要预先完成网站埋点和记录事件配置 | 不受访问设备及浏览器限制,记录所有行为数据 | |
7 | 鼠标轨迹记录 | 鼠标移动轨迹与点击行为 | 需要安装鼠标移动监控工具 | 实时记录鼠标移动轨迹 | |
8 | 眼动监测 | 眼球移动及关注焦点 | 需要佩戴或安装眼动监测工具 | 受限环境下以眼动实时记录访问关注焦点 |
表3
不同个性化学术检索测评框架要素对比"
类型 要素 | 直接验证 | 间接验证 | |
---|---|---|---|
基于点击 | 基于引用 | ||
基本流程 | 招募真实用户,采集其建模数据及反馈数据,直接对个性化检索模型进行测评 | 采集真实用户的检索历史,将用户是否点击作为结果相关性判断标准展开测评 | 采集文献及引用文献数据集,基于参考文献为学者认为与当前文献相关的文献集的假设开展测评 |
测试文本集 | 根据招募用户所属领域及检索任务需要构建待测试文献集合 | 获取检索历史的平台索引文献集或其子集 | 领域文献集及其引用文献集合 |
备选检索 | 由用户自定义或预先设定探索型检索任务 | 检索历史中保存的曾用检索 | 可从文献的标题、关键词等关键字段中抽取形成备选检索 |
用户反馈 | 由招募用户通过打分或定性评价获取 | 用户点击即表示该文献与检索相关 | 学者引用即表示参考文献与检索相关 |
用户模型 | 采用口头访谈或书面形式获取用户个人信息及偏好数据 | 对用户的检索历史进行挖掘获取其兴趣和偏好 | 对学者学术成果进行挖掘获取其学术兴趣及偏好 |
代表数据集 | 由于隐私及构建成本问题,目前暂无公开数据集可用 | CiteData[ | PERSON[ |
[1] | SHEN X, TAN B, ZHAI C X. Ucair: capturing and ex-ploiting context for personalized search[C]// Proceedings of the ACM SIGIR 2005 Workshop on Information Retrieval in Context (IRiX). 2005, 45. |
[2] | TEEVAN J, Dumais S T, Horvitz E. Beyond the commons: Investigating the value of personalizing web search[C]// Proceedings of the Workshop on New Technologies for Personalized Information Access (PIA). 2005: 84-92. |
[3] | 赵静. 个性化信息检索及功能模型[J]. 图书与情报, 2004(1):72-74. |
[4] | 朱前东, 庞弘燊. 搜索引擎个性化检索研究综述[J]. 图书馆学刊, 2008(6):14-17. |
[5] | 李树青. 个性化信息检索技术综述[J]. 情报理论与实践, 2009, 32(5):107-113. |
[6] |
Liu J, Liu C, Belkin N J. Personalization in text information retrieval: A survey[J]. Journal of the Association for Information Science and Technology, 2020, 71(3): 349-369.
doi: 10.1002/asi.v71.3 |
[7] | Goldenberg D, Kofman K, Albert J, et al. Personalization in practice: Methods and applications[C]// Proceedings of the 14th ACM international conference on web search and data mining. 2021: 1123-1126. |
[8] | Rafieian O, Yoganarasimhan H. AI and personalization//Sudhir K,Toubia O (Ed.). Artificial Intelligence in Marketing (Review of Marketing Research, Vol. 20)[M]. Emerald Publishing Limited, 2023: 77-102. https://omidraf.github.io/data/Personalization.pdf. |
[9] |
Sendhilkumar S, Geetha T V. Architecture for effective personalised web search[J]. International Journal of Computer Applications in Technology, 2009, 35(2-4): 219-233.
doi: 10.1504/IJCAT.2009.026599 |
[10] | Gemechu F, Yu Z, Ting L. A framework for personalized information retrieval model[C]// 2010 Second International Conference on Computer and Network Technology. IEEE, 2010: 500-505. |
[11] | Wang H, Wong K. Personalized search: An interactive and iterative approach[C]// 2014 IEEE World Congress on Services. IEEE, 2014: 3-10. |
[12] | 李月琳, 张佳. 基于任务的个性化信息检索用户模型[J]. 情报理论与实践, 2015, 38(5): 60-65. |
[13] | Choochaiwattana W. An architecture of an academic search engine with personalized search result ranking mechanism[C]// Proceedings of the Fifth International Conference on Network, Communication and Computing. 2016: 161-165. |
[14] | Jing-sen L, Guan-zhong D, Yu L. A Personalized Retrieval System with Preserving Privacy[C]// 2008 3rd IEEE Conference on Industrial Electronics and Applications. IEEE, 2008: 2444-2448. |
[15] |
Sánchez D, Castellà-Roca J, Viejo A. Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines[J]. Information Sciences, 2013, 218: 17-30.
doi: 10.1016/j.ins.2012.06.025 |
[16] |
Romero-Tris C, Castellà D, Viejo A, et al. Design of a P2P network that protects users’ privacy in front of Web Search Engines[J]. Computer Communications, 2015, 57: 37-49.
doi: 10.1016/j.comcom.2014.09.003 |
[17] | Abri S, Abri R, Çetin S. A classification on different aspects of user modelling in personalized web search[C]// Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval. 2020: 194-199. |
[18] | 聂鑫. 论数字图书馆个性化信息服务[J]. 情报科学, 2005(2):208-212. |
[19] | Oard D W, Kim J. Modeling information content using observable behavior[J]. Journal of the Association for Information Science and Technology, 2001:481-488. |
[20] | Gauch S, Speretta M, Chandramouli A, et al. User profiles for personalized information access[J]. The adaptive Web: methods and strategies of Web personalization, 2007: 54-89. |
[21] | 洪宇, 王剑, 王凯, 等. 面向满意度预测的滑鼠行为量化分析方法[J]. 计算机学报, 2015, 38(10): 2064-2075. |
[22] | 陈永强. 基于眼动和主题模型的个性化实时查询扩展模型的研究[D]. 天津: 天津大学, 2016. |
[23] | Granka L A, Joachims T, Gay G. Eye-tracking analysis of user behavior in WWW search[C] // Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 2004: 478-479. |
[24] | 夏立新, 周鼎, 秦晓琪, 等. 学术信息探索式搜索行为的情感变化与眼动特征研究[J]. 情报杂志, 2022, 41(4):135-143. |
[25] | Micarelli A, Gasparetti F, Sciarrone F, et al. Personalized search on the world wide web[J]. The adaptive web: Methods and strategies of web personalization, 2007: 195-230. |
[26] |
Jiang X, Tan A H. Learning and inferencing in user ontology for personalized Semantic Web search[J]. Information Sciences, 2009, 179(16): 2794-2808.
doi: 10.1016/j.ins.2009.04.005 |
[27] | 张克状, 刘友华, 黄芳, 等. 一种面向用户兴趣的个性化语义查询扩展方法[J]. 现代图书情报技术, 2008(8):48-52. |
[28] | 彭骏, 陆敏, 杨发毅. 基于本体的数字图书馆个性化知识检索研究[J]. 情报理论与实践, 2009, 32(5):78-80. |
[29] | Park Y. Recommending personalized search terms for assisting exploratory website search[C]// 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 2019: 404-405. |
[30] | Frihat S. Context-sensitive, personalized search at the Point of Care[C]// Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries. 2022: 1-2. |
[31] |
Frias‐Martinez E, Chen S Y, Liu X. Automatic cognitive style identification of digital library users for personalization[J]. Journal of the American Society for Information Science and Technology, 2007, 58(2): 237-251.
doi: 10.1002/asi.v58:2 |
[32] |
Tian X, Du X, Hu H, et al. Modeling individual cognitive structure in contextual information retrieval[J]. Computers & Mathematics with Applications, 2009, 57(6): 1048-1056.
doi: 10.1016/j.camwa.2008.10.059 |
[33] |
Zeng Y, Zhou E, Wang Y, et al. Research interests: Their dynamics, structures and applications in unifying search and reasoning[J]. Journal of Intelligent Information Systems, 2011, 37: 65-88.
doi: 10.1007/s10844-010-0144-1 |
[34] |
Krapp A, Prenzel M. Research on interest in science: Theories, methods, and findings[J]. International Journal of Science Education, 2011, 33(1): 27-50.
doi: 10.1080/09500693.2010.518645 |
[35] | Zeng Y, Wang Y, Huang Z, et al. User interests: Definition, vocabulary, and utilization in unifying search and reasoning[C]// Active Media Technology:6th International Conference, AMT 2010, Toronto, Canada, August 28-30, 2010. Proceedings 6. Springer Berlin Heidelberg, 2010: 98-107. |
[36] | 孙雨生, 刘伟, 仇蓉蓉, 等. 国内用户兴趣建模研究进展[J]. 情报杂志, 2013, 32(5): 145-149 +165. |
[37] | 徐芳, 应洁茹. 国内外用户画像研究综述[J]. 图书馆学研究, 2020(12): 7-16. |
[38] | 孙雨生. 国内基于本体的用户兴趣建模研究进展(上)——基础、框架与应用[J]. 情报理论与实践, 2014, 37(12):133-137. |
[39] |
张彬, 徐建民, 吴姣. 大数据环境下基于知识图谱的用户兴趣扩展模型研究[J]. 现代情报, 2021, 41(8):36-44.
doi: 10.3969/j.issn.1008-0821.2021.08.004 |
[40] | Zhang J, Du R, Zhu L, et al. Academic User Interest Extraction using Multi-feature TextRank Based on Interest Attenuation[C]// Proceedings of the 5th International Conference on Information Management and Management Science. 2022: 77-84. |
[41] | 李媛媛, 李旭晖. 结合本体与社会化标签的用户动态兴趣建模研究[J]. 情报学报, 2020, 39(4): 436-449. |
[42] | 胡吉明, 胡昌平. 基于主题层次树和语义向量空间模型的用户建模[J]. 情报学报, 2013, 32(8):838-843. |
[43] | 史宝明, 贺元香, 张永. 个性化信息检索中用户兴趣建模与更新研究[J]. 计算机应用与软件, 2014, 31(3):7-10. |
[44] | Algarni A, Li Y, Xu Y. Selected new training documents to update user profile[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010: 799-808. |
[45] | 李月琳, 胡玲玲. 基于环境与情境的信息搜寻与搜索[J]. 情报科学, 2012, 30(1): 110-114. |
[46] | Schilit B, Adams N, Want R. Context-aware computing applications[C] //The 1st International Workshop on Mobile Computing Systems and Applications.Santa Cruz, CA, USA:IEEE, 1994:85-90. |
[47] |
洪颖. 基于情境感知的信息检索研究综述[J]. 图书情报工作, 2014, 58(16): 143-148.
doi: 10.13266/j.issn.0252-3116.2014.16.022 |
[48] | 葛桂丽, 袁凌云, 王兴超. 基于情境感知的用户个性化兴趣建模[J]. 计算机应用研究, 2017, 34(4):995-999. |
[49] | 赵瑞雪, 张洁, 寇远涛, 等. 农业科技信息资源一站式发现服务研究[J]. 数字图书馆论坛, 2017(11):7. |
[50] |
王瑞雪, 方婧, 李信, 等. 学术查询意图类目体系构建与分析:百度学术查询日志的实证[J]. 图书情报工作, 2021, 65(4):73-80.
doi: 10.13266/j.issn.0252-3116.2021.04.008 |
[51] |
王瑞雪, 方婧, 桂思思, 等. 基于深度学习算法的学术查询意图分类器构建[J]. 图书情报工作, 2021, 65(3):93-99.
doi: 10.13266/j.issn.0252-3116.2021.03.012 |
[52] |
Belkin N J, Oddy R N, Brooks H M. ASK for information retrieval: Part I. Background and theory[J]. Journal of Documentation, 1982, 38(2): 61-71.
doi: 10.1108/eb026722 |
[53] |
Chen S Y, Magoulas G D, Dimakopoulos D. A flexible interface design for web directories to accommodate different cognitive styles[J]. Journal of the American Society for Information Science and Technology, 2005, 56(1): 70-83.
doi: 10.1002/asi.v56:1 |
[54] | 张路路, 黄崑. 基于认知风格的数字图书馆用户信息检索行为研究[J]. 情报学报, 2018, 37 (11): 1164-1174. |
[55] |
Frias-Martinez E, Chen S Y, Liu X. Evaluation of a personalized digital library based on cognitive styles: Adaptivity vs. adaptability[J]. International Journal of Information Management, 2009, 29(1): 48-56.
doi: 10.1016/j.ijinfomgt.2008.01.012 |
[56] | 刘萍, 叶方倩, 杨志伟. 认知建构视角下交互式信息检索模型研究[J]. 图书情报知识, 2020(2): 93-101+122. |
[57] |
Rahman M M, Abdullah N A. A personalized group-based recommendation approach for Web search in E-learning[J]. IEEE Access, 2018, 6: 34166-34178.
doi: 10.1109/ACCESS.2018.2850376 |
[58] | Belkin N J. Anomalous states of knowledge as a basis for information retrieval[J]. Canadian journal of information science, 1980, 5(1): 133-143. |
[59] | 宋巍, 张宇, 刘挺, 等. 基于检索历史上下文的个性化查询重构技术研究[J]. 中文信息学报, 2010, 24(3):55-61. |
[60] | Tanapaisankit P, Watrous-deVersterre L, Song M. Personalized query expansion in the QIC system[C]// Proceedings of the 12th ACM/IEEE- CS Joint Conference on Digital Libraries. 2012: 259-262. |
[61] | Van T T, Beigbeder M. A comparison of re-ranking methods in digital libraries using user profiles[C]// 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 2008, 1: 751-754. |
[62] | Aloteibi S, Clark S. Learning to personalize for web search sessions[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020: 15-24. |
[63] | 唐晓玲, 何燕. 一种基于查询上下文的个性化检索模型研究[J]. 图书情报工作, 2011, 55(9): 122-125. |
[64] |
Zhang X, Li Y, Liu J, et al. Effects of interaction design in digital libraries on user interactions[J]. Journal of Documentation, 2008, 64(3): 438-463.
doi: 10.1108/00220410810867623 |
[65] | Kuurstra J. Individual Differences in Human-Computer Interaction: A review of empirical studies[D]. University of Twente, 2015. |
[66] | 柯青, 周海花. 基于用户认知风格差异的信息检索交互行为研究[M]. 北京: 科学出版社, 2017. |
[67] |
Chen S Y, Macredie R D. Cognitive styles and hypermedia navigation: Development of a learning model[J]. Journal of the American Society for Information Science and Technology, 2002, 53(1): 3-15.
doi: 10.1002/asi.v53:1 |
[68] |
Chen S Y, Magoulas G D, Dimakopoulos D. A flexible interface design for web directories to accommodate different cognitive styles[J]. Journal of the American Society for Information Science and Technology, 2005, 56(1): 70-83.
doi: 10.1002/asi.v56:1 |
[69] | Yamamoto Y, Yamamoto T. Personalization finder: a search interface for identifying and self-controlling web search personalization[C] //Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. 2020: 37-46. |
[70] | Shen X, Tan B, Zhai C X. Privacy protection in personalized search[C] //ACM SIGIR Forum. New York, NY, USA: ACM, 2007, 41(1): 4-17. |
[71] | 张奇云, 李超. 智慧图书馆读者个人信息保护研究——基于《个人信息保护法》视角[J]. 图书馆工作与研究, 2023(8):36-42. |
[72] | 康燕海, XIONG Li. 面向大数据的个性化检索中用户匿名化方法[J]. 西安电子科技大学学报, 2014, 41(5):148-154+160. |
[73] | Chen G, Bai H, Shou L, et al. UPS: efficient privacy protection in personalized web search[C]// Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 2011: 615-624. |
[74] | Yao J, Dou Z, Wen J R. Fedps: A privacy protection enhanced personalized search framework[C]// Proceedings of the Web Conference 2021. 2021: 3757-3766. |
[75] |
Tabrizi S A, Shakery A, Zamani H, et al. PERSON: Personalized information retrieval evaluation based on citation networks[J]. Information Processing & Management, 2018, 54(4): 630-656.
doi: 10.1016/j.ipm.2018.04.004 |
[76] | Bassani E, Kasela P, Raganato A, et al. A multi-domain benchmark for personalized search evaluation[C]// Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022: 3822-3827. |
[77] | Bai Q, Zhang Q, Hu Q, et al. ECNU at CLEF PIR 2018: Evaluation of personalized information retrieval[C]// CLEF (Working Notes). 2018. |
[78] | Harpale A, Yang Y, Gopal S, et al. Citedata: a new multi-faceted dataset for evaluating personalized search performance[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010: 549-558. |
[79] |
Dou Z, Song R, Wen J R, et al. Evaluating the effectiveness of personalized web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 21(8): 1178-1190.
doi: 10.1109/TKDE.2008.172 |
[80] | Dou Z, Song R, Wen J R. A large-scale evaluation and analysis of personalized search strategies[C]// Proceedings of the 16th international conference on World Wide Web. 2007: 581-590. |
[81] | 张晓娟. 信息类、导航类与事务类查询个性化潜力的对比析究[J]. 数字图书馆论坛, 2017(9):35-41. |
[82] | Cole A, O'Brien H. Using data-prompted interviews in interactive information retrieval research: A reflection on the study of self-efficacy when learning using search[C]// Proceedings of the 2023 Conference on Human Information Interaction and Retrieval. 2023: 406-411. |
[83] | Liu Y, Han T, Ma S. et al. Summary of Chatgpt/GPT-4 research and perspective towards the future of large language models[OL]. arXiv preprint arXiv:2304.01852, 2023.(arXiv不是正式期刊,请给出链接) |
[84] | 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路[J]. 数据分析与知识发现, 2023, 7(3):26-35. |
[85] | Chen J, Liu Z, Huang X, et al. When large language models meet personalization: Perspectives of challenges and opportunities[OL]. arXiv preprint arXiv:2307. 16376, 2023. |
[86] |
Ai Q, Bai T, Cao Z, et al. Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community[J]. AI Open, 2023, 4: 80-90.
doi: 10.1016/j.aiopen.2023.08.001 |
No related articles found! |
|