您的位置 首页 > 腾讯云社区

法兰克福拉丁词典:从形态扩展和词嵌入到符号图(CS CL)---刘子蔚

在本文中,我们介绍了法兰克福拉丁语词典(FLL),这是中世纪拉丁语的词汇资源,用于拉丁文本的词素化和词素化的后期编辑。我们描述了造词机的最新发展,并针对Capitularies语料库(包括6世纪至9世纪中叶的法兰克皇家royal书)进行了测试,该语料库是作为处理中世纪拉丁文的参考而创建的。我们还考虑了使用有限的众包过程对残词化进行后期校正,旨在持续审查和更新FLL。从该词素化过程产生的文本开始,我们通过词嵌入描述FLL的扩展,其通过SemioGraphs的交互遍历完善了数字增强的解释学圈。通过这种方式,本文主张对词元化有更全面的理解,包括经典的机器学习和智力的后修正,特别是基于潜在词汇资源的图形表示的解释过程形式的人类计算。

原文标题:The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

原文:In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.

原文作者:Alexander Mehler, Bernhard Jussen, Tim Geelhaar, Alexander Henlein, Giuseppe Abrami, Daniel Baumartz, Tolga Uslu, Wahed Hemati

原文地址:https://arxiv.org/abs/2005.10790

From Morphological Expansion and Word Embeddings to SemioGraphs.pdf ---来自腾讯云社区的---刘子蔚

关于作者: 瞎采新闻

这里可以显示个人介绍!这里可以显示个人介绍!

热门文章

留言与评论(共有 0 条评论)
   
验证码: