我们介绍了一种新的基于TLDR的科学论文自动摘要生成技术,该技术具有较高的源压缩能力,需要专业背景知识和复杂的语言理解能力。为了便于这项任务的研究,我们引入了SciTLDR,一个3.9K TLDRs的数据集。此外,我们还引入了一种新的注释协议,通过重写同行评审注释来可伸缩地管理额外的黄金摘要。我们使用这个协议来扩充我们的测试集,生成多个用于评估的黄金TLDRs,这与最近的摘要数据集不同,后者只假设一个有效的黄金摘要。我们提出了一种适应预训练语言模型的训练策略,该模型利用了TLDR生成与极端摘要和标题生成相关任务之间的相似性,其性能优于强提取和抽象摘要基线。
原文题目:TLDR: Extreme Summarization of Scientific Documents
原文:We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression requiring expert background knowledge and complex language understanding. To facilitate research on this task, we introduce SciTLDR, a dataset of 3.9K TLDRs. Furthermore, we introduce a novel annotation protocol for scalably curating additional gold summaries by rewriting peer review comments. We use this protocol to augment our test set, yielding multiple gold TLDRs for evaluation, which is unlike most recent summarization datasets that assume only one valid gold summary. We present a training strategy for adapting pretrained language models that exploits similarities between TLDR generation and the related tasks of extreme summarization and title generation, which outperforms strong extractive and abstractive summarization baselines.
原文作者:Isabel Cachola, Kyle Lo, Arman Cohan, Daniel S. Weld
原文地址:https://arxiv.org/abs/2004.15011
TLDR_ 对科学文献的极端总结(CS.CL).pdf ---来自腾讯云社区的---用户7236395
微信扫一扫打赏
支付宝扫一扫打赏