Zeyang Liu
Meta-evaluation of Conversational Search Evaluation Metrics
Liu, Zeyang; Zhou, Ke; Wilson, Max L.
Authors
Dr KE ZHOU KE.ZHOU@NOTTINGHAM.AC.UK
ASSISTANT PROFESSOR
Dr MAX WILSON MAX.WILSON@NOTTINGHAM.AC.UK
ASSOCIATE PROFESSOR
Abstract
Conversational search systems, such as Google assistant and Microsoft Cortana, enable users to interact with search systems in multiple rounds through natural language dialogues. Evaluating such systems is very challenging, given that any natural language responses could be generated, and users commonly interact for multiple semantically coherent rounds to accomplish a search task. Although prior studies proposed many evaluation metrics, the extent of how those measures effectively capture user preference remain to be investigated. In this article, we systematically meta-evaluate a variety of conversational search metrics. We specifically study three perspectives on those metrics: (1) reliability : the ability to detect “actual” performance differences as opposed to those observed by chance; (2) fidelity : the ability to agree with ultimate user preference; and (3) intuitiveness : the ability to capture any property deemed important: adequacy, informativeness, and fluency in the context of conversational search. By conducting experiments on two test collections, we find that the performance of different metrics vary significantly across different scenarios, whereas consistent with prior studies, existing metrics only achieve weak correlation with ultimate user preference and satisfaction. METEOR is, comparatively speaking, the best existing single-turn metric considering all three perspectives. We also demonstrate that adapted session-based evaluation metrics can be used to measure multi-turn conversational search, achieving moderate concordance with user satisfaction. To our knowledge, our work establishes the most comprehensive meta-evaluation for conversational search to date.
Citation
Liu, Z., Zhou, K., & Wilson, M. L. (2021). Meta-evaluation of Conversational Search Evaluation Metrics. ACM Transactions on Information Systems, 39(4), 1-42. https://doi.org/10.1145/3445029
Journal Article Type | Article |
---|---|
Acceptance Date | Dec 1, 2020 |
Online Publication Date | Jan 16, 2023 |
Publication Date | Sep 1, 2021 |
Deposit Date | Mar 5, 2025 |
Publicly Available Date | Mar 6, 2025 |
Journal | ACM Transactions on Information Systems |
Print ISSN | 1046-8188 |
Electronic ISSN | 1558-2868 |
Publisher | Association for Computing Machinery (ACM) |
Peer Reviewed | Peer Reviewed |
Volume | 39 |
Issue | 4 |
Article Number | 52 |
Pages | 1-42 |
DOI | https://doi.org/10.1145/3445029 |
Keywords | Computer Science Applications, General Business, Management and Accounting, Information Systems |
Public URL | https://nottingham-repository.worktribe.com/output/16216936 |
Publisher URL | https://dl.acm.org/doi/10.1145/3445029 |
Files
2104.13453v1
(1.4 Mb)
PDF
You might also like
Meta-evaluation of online and offline web search evaluation metrics
(2017)
Presentation / Conference Contribution
Detecting collusive spamming activities in community question answering
(2017)
Presentation / Conference Contribution
Does document relevance affect the searcher's perception of time?
(2017)
Presentation / Conference Contribution
Palimpsest: improving assisted curation of loco-specific literature
(2016)
Journal Article
Predicting pre-click quality for native advertisements
(2016)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search