You are viewing a plain text version of this content. The canonical link for it is here.
Posted to events@mxnet.apache.org by apachemxnetday <ap...@nvidia.com> on 2020/11/23 16:45:14 UTC

FW: Submission for Apache MXNet Day: Optimizing Inference for Neural Machine Translation using Sockeye 2


From: Brenton Chu <bc...@nvidia.com>
Date: Thursday, November 12, 2020 at 2:16 PM
To: apachemxnetday <ap...@nvidia.com>
Subject: Submission for Apache MXNet Day: Optimizing Inference for Neural Machine Translation using Sockeye 2

Title:
Optimizing Inference for Neural Machine Translation using Sockeye 2

Abstract:
Transformer networks have revolutionized the field of Machine Translation and have been shown to produce better translations, especially for long input sentences, than the traditional recurrent neural networks.
However, such models can become computationally intensive when considering the length of the output sentences. In this session, we will explore the Transformer based model using Sockeye, the open source NMT implementation that powers Amazon Translate. We will discuss methods to profile deep learning workloads using Nvidia NSight Systems and identify areas for improving performance. Specific optimizations including faster multi-head attention, using and supporting lower precision, and beam search updates are also discussed; these optimizations can provide up to 15x speed-up over a comparable CPU instance. All the relevant changes have been made available as part of latest release of Apache MXNet and Amazon Sockeye framework
Finally, we will demonstrate the impact of these optimization techniques by showing the most cost-effective Inference to date on an Amazon EC2 G4 instance with Nvidia T4 GPUs.

Speaker:
Brenton Chu