TLDR: 对科学文献的极端总结（CS.CL）---用户7236395--瞎采新闻

我们介绍了一种新的基于TLDR的科学论文自动摘要生成技术，该技术具有较高的源压缩能力，需要专业背景知识和复杂的语言理解能力。为了便于这项任务的研究，我们引入了SciTLDR，一个3.9K TLDRs的数据集。此外，我们还引入了一种新的注释协议，通过重写同行评审注释来可伸缩地管理额外的黄金摘要。我们使用这个协议来扩充我们的测试集，生成多个用于评估的黄金TLDRs，这与最近的摘要数据集不同，后者只假设一个有效的黄金摘要。我们提出了一种适应预训练语言模型的训练策略，该模型利用了TLDR生成与极端摘要和标题生成相关任务之间的相似性，其性能优于强提取和抽象摘要基线。

原文题目：TLDR: Extreme Summarization of Scientific Documents

原文：We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression requiring expert background knowledge and complex language understanding. To facilitate research on this task, we introduce SciTLDR, a dataset of 3.9K TLDRs. Furthermore, we introduce a novel annotation protocol for scalably curating additional gold summaries by rewriting peer review comments. We use this protocol to augment our test set, yielding multiple gold TLDRs for evaluation, which is unlike most recent summarization datasets that assume only one valid gold summary. We present a training strategy for adapting pretrained language models that exploits similarities between TLDR generation and the related tasks of extreme summarization and title generation, which outperforms strong extractive and abstractive summarization baselines.

原文作者：Isabel Cachola, Kyle Lo, Arman Cohan, Daniel S. Weld

原文地址：https://arxiv.org/abs/2004.15011

TLDR_ 对科学文献的极端总结（CS.CL）.pdf ---来自腾讯云社区的---用户7236395

给这篇文章的作者打赏

关于作者: 瞎采新闻

相关文章

热门文章

1渗透利器 | 常见的WebShell管理工具---Bypass

2美国新冠病毒确诊人数统计及预测---用户5908113

3什么时候使用 useMemo 和 useCallback---Nealyang

4Android开发 - NFC基础---zhangyunfeiVir

5Gitlab配置Web Hook关联Jenkins实现push后自动部署---zhangyunfeiVir