我们会展现CovidQA,一个专门为CODIV-19设计的问答系统数据库的初始版本,手工筛选Kaggle’s COVID-19 Open Research Dataset Challenge上收集而来的知识。就我们所知,这是第一个该种类型中可以公开使用的资源,并且是直到更完善的资源出现之前的权宜之计。
然而,该数据库当前的0.1新发版本仅有124个问题-文章的组合构成,不足以满足有监督学习下的机器学习算法所需要的样本数。我们认为其可以用于评估零样本学习或者转移现有的专门针对COVID-19设计的模型容量。这篇论文描述了我们用于构建数据库的方法和显现一些基线的有效性,包括基于词汇的技术和各种各样的基于转换的模型,数据库此处有效this http URL。
原文题目:Rapidly Bootstrapping a Question Answering Dataset for COVID-19
原文:We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at this http URL
原文作者:Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin
原文地址:https://arxiv.org/abs/2004.11339
为CODIV-19采用自举法快速产生问答数据库(cs.CL).pdf ---来自腾讯云社区的---用户7199428
微信扫一扫打赏
支付宝扫一扫打赏