我们认为一个对手想要窃取或攻击一个黑箱机器翻译(MT)系统,无论是为了经济利益还是利用模型错误。我们首先证明了黑盒机器翻译系统可以通过查询单语句子和训练模型来模仿它们的输出来窃取。通过仿真实验,我们证明了即使仿真模型的输入数据或结构与被仿真模型不同,MT模型窃取也是可能的。应用这些思想,我们训练出在高资源和低资源语言对上的三个生产MT系统的误差率均在0.6 BLEU以内的模仿模型。然后,我们利用模仿模型的相似性,将对抗性的例子转移到生产系统中。我们使用基于梯度的攻击来暴露输入,从而导致语义不正确的翻译、丢失的内容和庸俗的模型输出。为了减少这些漏洞,我们提出了一个修改翻译输出的防御措施,以误导模仿模型的优化。该方法在一定程度上降低了仿真模型的误码率和攻击转移率,降低了误码率和推理速度。
原文题目:Imitation Attacks and Defenses for Black-box Machine Translation Systems
原文:We consider an adversary looking to steal or attack a black-box machine translation (MT) system, either for financial gain or to exploit model errors. We first show that black-box MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. Using simulated experiments, we demonstrate that MT model stealing is possible even when imitation models have different input data or architectures than their victims. Applying these ideas, we train imitation models that reach within 0.6 BLEU of three production MT systems on both high-resource and low-resource language pairs. We then leverage the similarity of our imitation models to transfer adversarial examples to the production systems. We use gradient-based attacks that expose inputs which lead to semantically-incorrect translations, dropped content, and vulgar model outputs. To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models. This defense degrades imitation model BLEU and attack transfer rates at some cost in BLEU and inference speed.
原文作者:Eric Wallace, Mitchell Stern, Dawn Song
原文地址:https://arxiv.org/abs/2004.15015
黑盒机器翻译系统的模仿攻击与防御(CS.CL;CS.LG).pdf ---来自腾讯云社区的---用户7236395
微信扫一扫打赏
支付宝扫一扫打赏