我们介绍了BERTweet,这是第一个针对英语Tweets的公共大规模预训练语言模型。 我们的BERTweet使用RoBERTa预训练程序进行训练(Liu等人,2019),其模型配置与BERT-base相同(Devlin等人,2019)。 实验表明,BERTweet优于基于RoBERTa和XLM-R的强基准(Conneau等人,2020年),在三个Tweet NLP任务上,其性能结果均优于以前的最新模型: 语音标记,命名实体识别和文本分类。 我们发布BERTweet,以促进将来对Tweet数据的研究和下游应用。 我们的BERTweet可在以下网址获得:https URL
原文标题:BERTweet: A pre-trained language model for English Tweets
原文:We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet is trained using the RoBERTa pre-training procedure (Liu et al., 2019), with the same model configuration as BERT-base (Devlin et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet to facilitate future research and downstream applications on Tweet data. Our BERTweet is available at: this https URL
原文作者:Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen
原文地址:https://arxiv.org/abs/2005.10200
BERTweet:英语Tweets的预训练语言模型(CS CL).pdf ---来自腾讯云社区的---用户7305506
微信扫一扫打赏
支付宝扫一扫打赏