本文介绍了RuBQ,这是俄罗斯第一个知识库问答(KBQA)数据集。高质量的数据集包含1500个复杂程度各异的俄语问题,其英语机器翻译,对Wikidata的SPARQL查询,参考答案以及包含带有俄语标签的实体的Wikidata三元组样本。数据集的创建始于大量来自在线测验的问答对。数据经过自动过滤,人群辅助实体链接,SPARQL查询的自动生成以及随后的内部验证。
原文标题:RuBQ: A Russian Dataset for Question Answering over Wikidata
原文:The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
原文作者:Vladislav Korablinov, Pavel Braslavski
原文地址:https://arxiv.org/abs/2005.10659
RuBQ A Russian Dataset for Question Answering over Wikidata.pdf ---来自腾讯云社区的---刘子蔚
微信扫一扫打赏
支付宝扫一扫打赏