极端多标签分类(XMC)是从非常大的可能标签范围中查找输入内容的相关标签的问题。我们认为XMC是在标签仅适用于一组样本而不适用于单个样本的环境中。当前的XMC方法并未针对此类多实例多标签(MIML)训练数据构建,并且MIML方法无法缩放至XMC大小。我们开发了一种新的可扩展算法,从分组标签中插入单个样本标签。可以将其与任何现有XMC方法配合使用以解决聚合标签问题。我们在温和的假设下表征了算法的统计特性,并提供了MIML作为扩展的新的端到端框架。聚合标签XMC和MIML任务的实验显示了优于现有方法的优势。
原文标题:Extreme Multi-label Classification from Aggregated Labels
原文:Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.
原文作者:Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, Inderjit Dhillon
原文地址:https://arxiv.org/abs/2004.00198
聚合标签的极端多标签分类(CS.LG).pdf ---来自腾讯云社区的---蔡小雪7100294
微信扫一扫打赏
支付宝扫一扫打赏