You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@tvm.apache.org by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai> on 2020/09/11 02:13:26 UTC

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM


We tried to import RNN-T pytorch model https://github.com/mlperf/inference/tree/master/v0.7/speech_recognition/rnnt/pytorch
 into TVM.

Pre-trained RNN-T model for MLPerf Inference https://zenodo.org/record/3662521

We found the error: 

NotImplementedError: The following operators are not implemented: ['prim::RaiseException', 'prim::Uninitialized', 'aten::lstm', 'aten::_cast_Int', 'aten::__derive_index', 'aten::not', 'aten::tensor', 'aten::item', 'aten::cast_Float', 'prim::data', 'aten::format', 'aten::copy', 'aten::append', 'aten::FloatImplicit', 'prim::dtype', 'prim::shape', 'prim::TupleIndex', 'aten::dim', 'aten::warn', 'aten::is', 'prim::unchecked_cast']

How can we go through these errors to make RNN-T running on TVM?





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/fd4ff86816e38dee8847afca28af31f56532f1e6a48274cdf6e555a737c22667).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


Hi, I copy and paste 4 script. The RNN is defined in 'rnn.py'. The rnnt is defined in 'model_rnnt.py'.
I tried to import the rnnt model into TVM . Please check it in first script.
In the function get_rnnt_model I load the pre-trained model RNNT.
In the function rnnt_model_to_tvm_mod , I tried to transform it into TVM.
```
def get_rnnt_model(featurizer_config, model_definition, ctc_vocab, ckpt):
    model = RNNT(
        feature_config=featurizer_config,
        rnnt=model_definition['rnnt'],
        num_classes=len(ctc_vocab)
    )
    checkpoint = torch.load(ckpt, map_location="cpu")
    model.load_state_dict(checkpoint['state_dict'], strict=False)
    model.eval()
    return model

def rnnt_model_to_tvm_mod(model):
    input_shape = (316, 1, 240)
    len_shape = (316)
    t_audio_signal_e = torch.randn(input_shape)
    t_a_sig_length_e = torch.randn(len_shape)
    model.encoder = torch.jit.trace(model.encoder, (t_audio_signal_e, t_a_sig_length_e)).eval()

    mod, params = relay.frontend.from_pytorch(model.encoder, input_shapes=None)
    mod = relay.transform.RemoveUnusedFunctions()(mod)
    return mod, params
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/9) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/4eee8e043d12bb2734a67633e8328268b4da7715f22d46f14dd122ea21c02abc).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


```
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
import toml
import torch
import torchvision
import argparse
from tvm import relay
from tqdm import tqdm
import tvm
from model_rnnt import RNNT
from decoders import TransducerDecoder
from preprocessing import AudioPreprocessing
from dataset import AudioToTextDataLayer
from helpers import process_evaluation_batch, process_evaluation_epoch, add_blank_label, print_dict

def get_args():
    """Parse commandline."""
    parser = argparse.ArgumentParser()
    parser.add_argument('--ckpt', type=str, required=True, help='The rnnt model path (pytorch ckpt path)')
    parser.add_argument('--dataset-path', type=str, required=True, help='The dataset path')
    parser.add_argument("--val_manifest", type=str, required=True,
                        help='relative path to evaluation dataset manifest file')
    parser.add_argument("--model_toml", type=str, default='configs/rnnt.toml',
                        help='relative model configuration path given dataset folder')
    parser.add_argument("--steps", default=100,
                        help='if not specified do evaluation on full dataset. '
                             'otherwise only evaluates the specified number of iterations for each worker', type=int)
    parser.add_argument('--batch_size', type=int, default=1, help='The batchsize to inference')
    parser.add_argument('--target', type=str, default='cuda', help='The target in TVM')
    parser.add_argument('--network', type=str, default='rnnt', help='The name of network')
    parser.add_argument('--cs', type=str, default=80, help='The number of calibration dataset')
    args = parser.parse_args()
    return args


class RNNTGreedyDecoder(TransducerDecoder):
    """A greedy transducer decoder.

    Args:
        blank_symbol: See `Decoder`.
        model: Model to use for prediction.
        max_symbols_per_step: The maximum number of symbols that can be added
            to a sequence in a single time step; if set to None then there is
            no limit.
        cutoff_prob: Skip to next step in search if current highest character
            probability is less than this.
    """

    def __init__(self, blank_index, model, mod=None, params=None, target='cuda', max_symbols_per_step=30):
        super().__init__(blank_index, model)
        assert max_symbols_per_step is None or max_symbols_per_step > 0
        self.mod = mod
        self.params = params
        self.target = target
        self.ctx = tvm.context(target)
        self.max_symbols = max_symbols_per_step

    def decode(self, x, out_lens):
        """Returns a list of sentences given an input batch.

        Args:
            x: A tensor of size (batch, channels, features, seq_len)
                TODO was (seq_len, batch, in_features).
            out_lens: list of int representing the length of each sequence
                output sequence.

        Returns:
            list containing batch number of sentences (strings).
        """
        with torch.no_grad():
            # Apply optional preprocessing

            # x_packed = torch.nn.utils.rnn.pack_padded_sequence(x, out_lens)
            # logits, out_lens = self._model.encode(x, out_lens)
            logits, out_lens = self._model.encoder(x, out_lens)
            # executor = relay.create_executor('graph', self.mod, self.ctx, self.target)
            # logits, out_lens = executor.evaluate(self.mod["main"])(x, out_lens)

            output = []
            for batch_idx in range(logits.size(0)):
                inseq = logits[batch_idx, :, :].unsqueeze(1)
                logitlen = out_lens[batch_idx]
                sentence = self._greedy_decode(inseq, logitlen)
                output.append(sentence)

        return output

    def _greedy_decode(self, x, out_len):
        training_state = self._model.training
        self._model.eval()

        device = x.device

        hidden = None
        label = []
        for time_idx in range(out_len):
            f = x[time_idx, :, :].unsqueeze(0)

            not_blank = True
            symbols_added = 0

            while not_blank and (self.max_symbols is None or symbols_added < self.max_symbols):
                g, hidden_prime = self._pred_step(
                    self._get_last_symb(label),
                    hidden,
                    device
                )
                logp = self._joint_step(f, g, log_normalize=False)[0, :]

                # get index k, of max prob
                v, k = logp.max(0)
                k = k.item()

                if k == self._blank_id:
                    not_blank = False
                else:
                    label.append(k)
                    hidden = hidden_prime
                symbols_added += 1

        self._model.train(training_state)
        return label

def get_rnnt_model(featurizer_config, model_definition, ctc_vocab, ckpt):
    model = RNNT(
        feature_config=featurizer_config,
        rnnt=model_definition['rnnt'],
        num_classes=len(ctc_vocab)
    )
    checkpoint = torch.load(ckpt, map_location="cpu")
    model.load_state_dict(checkpoint['state_dict'], strict=False)
    model.eval()
    return model

def rnnt_model_to_tvm_mod(model):
    input_shape = (316, 1, 240)
    len_shape = (316)
    t_audio_signal_e = torch.randn(input_shape)
    t_a_sig_length_e = torch.randn(len_shape)
    model.encoder = torch.jit.trace(model.encoder, (t_audio_signal_e, t_a_sig_length_e)).eval()

    mod, params = relay.frontend.from_pytorch(model.encoder, input_shapes=None)
    mod = relay.transform.RemoveUnusedFunctions()(mod)
    return mod, params


def gat_data_layer(args, featurizer_config, val_manifest, dataset_vocab):
    data_layer = AudioToTextDataLayer(
        dataset_dir=args.dataset_path,
        featurizer_config=featurizer_config,
        manifest_filepath=val_manifest,
        labels=dataset_vocab,
        batch_size=args.batch_size,
        pad_to_max=featurizer_config['pad_to'] == "max",
        shuffle=False)
    return data_layer

def get_audio_preprocessor(featurizer_config):
    audio_preprocessor = AudioPreprocessing(**featurizer_config)
    audio_preprocessor.featurizer.normalize = "per_feature"
    audio_preprocessor.eval()
    return audio_preprocessor

def get_eval_transforms(audio_preprocessor):
    eval_transforms = []
    eval_transforms.append(lambda xs: [*audio_preprocessor(xs[0:2]), *xs[2:]])
    # These are just some very confusing transposes, that's all.
    # BxFxT -> TxBxF
    eval_transforms.append(lambda xs: [xs[0].permute(2, 0, 1), *xs[1:]])
    eval_transforms = torchvision.transforms.Compose(eval_transforms)
    return eval_transforms


def eval(data_layer, audio_processor, encoderdecoder, greedy_decoder, labels, args):
    """performs inference / evaluation
        Args:
            data_layer: data layer object that holds data loader
            audio_processor: data processing module
            encoderdecoder: acoustic model
            greedy_decoder: greedy decoder
            labels: list of labels as output vocabulary
            args: script input arguments
        """
    encoderdecoder.eval()
    with torch.no_grad():
        _global_var_dict = {
            'predictions': [],
            'transcripts': [],
            'logits': [],
        }

        for it, data in enumerate(tqdm(data_layer.data_iterator)):
            (t_audio_signal_e, t_a_sig_length_e,
             transcript_list, t_transcript_e,
             t_transcript_len_e) = audio_processor(data)

            t_predictions_e = greedy_decoder.decode(
                t_audio_signal_e, t_a_sig_length_e)

            values_dict = dict(
                predictions=[t_predictions_e],
                transcript=transcript_list,
                transcript_length=t_transcript_len_e,
            )
            process_evaluation_batch(
                values_dict, _global_var_dict, labels=labels)

            if args.steps is not None and it + 1 >= args.steps:
                break
        wer = process_evaluation_epoch(_global_var_dict)
        print("==========>>>>>>Evaluation WER: {0}\n".format(wer))


def main():
    args = get_args()
    model_definition = toml.load(args.model_toml)
    dataset_vocab = model_definition['labels']['labels']
    ctc_vocab = add_blank_label(dataset_vocab)
    featurizer_config = model_definition['input_eval']

    rnnt_model = get_rnnt_model(featurizer_config, model_definition, ctc_vocab, args.ckpt)
    mod, params = rnnt_model_to_tvm_mod(rnnt_model)
    data_layer = gat_data_layer(args, featurizer_config, args.val_manifest, dataset_vocab)
    audio_preprocessor = get_audio_preprocessor(featurizer_config)
    eval_transforms = get_eval_transforms(audio_preprocessor)
    greedy_decoder = RNNTGreedyDecoder(len(ctc_vocab) - 1, rnnt_model, mod=None, params=None, target=args.target)
    eval(data_layer=data_layer,
        audio_processor=eval_transforms,
        encoderdecoder=rnnt_model,
        greedy_decoder=greedy_decoder,
        labels=ctc_vocab,
        args=args)

if __name__ == '__main__':
    main()
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/c1bc8a25b81b3c02bd32101c435d28a8452583954b40dabe1d5ffc4bcedaa675).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


rnn.py
```
# Copyright (c) 2019, Myrtle Software Limited. All rights reserved.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
from typing import Optional, Tuple


def rnn(rnn, input_size, hidden_size, num_layers, norm=None,
        forget_gate_bias=1.0, dropout=0.0, **kwargs):
    """TODO"""
    if rnn != "lstm":
        raise ValueError(f"Unknown rnn={rnn}")
    if norm not in [None]:
        raise ValueError(f"unknown norm={norm}")

    if rnn == "lstm":
        return LstmDrop(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            forget_gate_bias=forget_gate_bias,
            **kwargs
        )


class LstmDrop(torch.nn.Module):

    def __init__(self, input_size, hidden_size, num_layers, dropout, forget_gate_bias,
                 **kwargs):
        """Returns an LSTM with forget gate bias init to `forget_gate_bias`.

        Args:
            input_size: See `torch.nn.LSTM`.
            hidden_size: See `torch.nn.LSTM`.
            num_layers: See `torch.nn.LSTM`.
            dropout: See `torch.nn.LSTM`.
            forget_gate_bias: For each layer and each direction, the total value of
                to initialise the forget gate bias to.

        Returns:
            A `torch.nn.LSTM`.
        """
        super(LstmDrop, self).__init__()

        # Interesting, torch LSTM allows specifying number of
        # layers... Fan-out parallelism.
        # WARNING: Is dropout repeated twice?
        self.lstm = torch.nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
        )
        if forget_gate_bias is not None:
            for name, v in self.lstm.named_parameters():
                if "bias_ih" in name:
                    bias = getattr(self.lstm, name)
                    bias.data[hidden_size:2 * hidden_size].fill_(forget_gate_bias)
                if "bias_hh" in name:
                    bias = getattr(self.lstm, name)
                    bias.data[hidden_size:2 * hidden_size].fill_(0)

        self.inplace_dropout = (torch.nn.Dropout(dropout, inplace=True)
                                if dropout else None)

    #zzhen developed
    def forward(self, x: torch.Tensor, h: Optional[Tuple[torch.Tensor, torch.Tensor]] = None):
        x, h = self.lstm(x, h)
        if self.inplace_dropout is not None:
            self.inplace_dropout(x.data)

        return x, h


class StackTime(torch.nn.Module):
    def __init__(self, factor):
        super().__init__()
        self.factor = int(factor)

    def forward(self, x, x_lens):
        # T, B, U
        # x, x_lens = x
        seq = [x]
        for i in range(1, self.factor):
            # This doesn't seem to make much sense...
            tmp = torch.zeros_like(x)
            tmp[:-i, :, :] = x[i:, :, :]
            seq.append(tmp)
        x_lens = torch.ceil(x_lens.float() / self.factor).int()
        # Gross, this is horrible. What a waste of memory...
        return torch.cat(seq, dim=2)[::self.factor, :, :], x_lens
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/59a3794a2c7e6e59b4bf27f13d85943f583c59e4abfbb76c892e92053d7b98a0).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


There are some other files "helpers.py", "metrics.py", "preprocessing.py", "dataset.py"

helpers.py:
```
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019, Myrtle Software Limited. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from enum import Enum
from metrics import word_error_rate


class Optimization(Enum):
    """Various levels of Optimization.
    WARNING: This might have effect on model accuracy."""
    nothing = 0
    mxprO0 = 1
    mxprO1 = 2
    mxprO2 = 3
    mxprO3 = 4


AmpOptimizations = {Optimization.mxprO0: "O0",
                    Optimization.mxprO1: "O1",
                    Optimization.mxprO2: "O2",
                    Optimization.mxprO3: "O3"}


def add_blank_label(labels):
    if not isinstance(labels, list):
        raise ValueError("labels must be a list of symbols")
    labels.append("<BLANK>")
    return labels


def __rnnt_decoder_predictions_tensor(tensor, labels):
    """
    Takes output of greedy rnnt decoder and converts to strings.
    Args:
        tensor: model output tensor
        label: A list of labels
    Returns:
        prediction
    """
    hypotheses = []
    labels_map = dict([(i, labels[i]) for i in range(len(labels))])
    # iterate over batch
    for ind in range(len(tensor)):
        hypothesis = ''.join([labels_map[c] for c in tensor[ind]])
        hypotheses.append(hypothesis)
    return hypotheses


def __gather_predictions(predictions_list: list, labels: list) -> list:
    results = []
    for prediction in predictions_list:
        results += __rnnt_decoder_predictions_tensor(prediction, labels=labels)
    return results


def __gather_transcripts(transcript_list: list, transcript_len_list: list,
                         labels: list) -> list:
    results = []
    labels_map = dict([(i, labels[i]) for i in range(len(labels))])
    for i, t in enumerate(transcript_list):
        target = t.numpy().tolist()
        reference = ''.join([labels_map[c] for c in target])
        results.append(reference)
    return results


def process_evaluation_batch(tensors: dict, global_vars: dict, labels: list):
    """
    Processes results of an iteration and saves it in global_vars
    Args:
        tensors: dictionary with results of an evaluation iteration, e.g. loss, predictions, transcript, and output
        global_vars: dictionary where processes results of iteration are saved
        labels: A list of labels
    """
    for kv, v in tensors.items():
        if kv.startswith('predictions'):
            global_vars['predictions'] += __gather_predictions(
                v, labels=labels)
        elif kv.startswith('transcript_length'):
            transcript_len_list = v
        elif kv.startswith('transcript'):
            transcript_list = v

    global_vars['transcripts'] += __gather_transcripts(transcript_list,
                                                       transcript_len_list,
                                                       labels=labels)


def process_evaluation_epoch(global_vars: dict, tag=None):
    """
    Processes results from each worker at the end of evaluation and combine to final result
    Args:
        global_vars: dictionary containing information of entire evaluation
    Return:
        wer: final word error rate
        loss: final loss
    """
    hypotheses = global_vars['predictions']
    references = global_vars['transcripts']

    wer, scores, num_words = word_error_rate(
        hypotheses=hypotheses, references=references)
    return wer


def print_dict(d):
    maxLen = max([len(ii) for ii in d.keys()])
    fmtString = '\t%' + str(maxLen) + 's : %s'
    print('Arguments:')
    for keyPair in sorted(d.items()):
        print(fmtString % keyPair)

```

metrics.py
```
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import List


def __levenshtein(a: List, b: List) -> int:
    """Calculates the Levenshtein distance between a and b.
    """
    n, m = len(a), len(b)
    if n > m:
        # Make sure n <= m, to use O(min(n,m)) space
        a, b = b, a
        n, m = m, n

    current = list(range(n + 1))
    for i in range(1, m + 1):
        previous, current = current, [i] + [0] * n
        for j in range(1, n + 1):
            add, delete = previous[j] + 1, current[j - 1] + 1
            change = previous[j - 1]
            if a[j - 1] != b[i - 1]:
                change = change + 1
            current[j] = min(add, delete, change)

    return current[n]


def word_error_rate(hypotheses: List[str], references: List[str]) -> float:
    """
    Computes Average Word Error rate between two texts represented as
    corresponding lists of string. Hypotheses and references must have same length.

    Args:
        hypotheses: list of hypotheses
        references: list of references

    Returns:
        (float) average word error rate
    """
    scores = 0
    words = 0
    if len(hypotheses) != len(references):
        raise ValueError("In word error rate calculation, hypotheses and reference"
                         " lists must have the same number of elements. But I got:"
                         "{0} and {1} correspondingly".format(len(hypotheses), len(references)))
    for h, r in zip(hypotheses, references):
        h_list = h.split()
        r_list = r.split()
        words += len(r_list)
        scores += __levenshtein(h_list, r_list)
    if words != 0:
        wer = (1.0 * scores) / words
    else:
        wer = float('inf')
    return wer, scores, words

```

preprocessing.py
```
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch.nn as nn

from helpers import Optimization
from parts.features import FeatureFactory


class AudioPreprocessing(nn.Module):
    """GPU accelerated audio preprocessing
    """

    def __init__(self, **kwargs):
        nn.Module.__init__(self)    # For PyTorch API
        self.optim_level = kwargs.get(
            'optimization_level', Optimization.nothing)
        self.featurizer = FeatureFactory.from_config(kwargs)

    def forward(self, x):
        input_signal, length = x
        length.requires_grad_(False)
        processed_signal = self.featurizer(x)
        processed_length = self.featurizer.get_seq_len(length)
        return processed_signal, processed_length

```
dataset.py
```
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
This file contains classes and functions related to data loading
"""
from collections import namedtuple
import torch
import numpy as np
from torch.utils.data import Dataset
from parts.manifest import Manifest
from parts.features import WaveformFeaturizer


def seq_collate_fn(batch):
    """batches samples and returns as tensors
    Args:
    batch : list of samples
    Returns
    batches of tensors
    """
    audio_lengths = torch.LongTensor([sample.waveform.size(0)
                                      for sample in batch])
    transcript_lengths = torch.LongTensor([sample.transcript.size(0)
                                           for sample in batch])
    permute_indices = torch.argsort(audio_lengths, descending=True)

    audio_lengths = audio_lengths[permute_indices]
    transcript_lengths = transcript_lengths[permute_indices]
    padded_audio_signals = torch.nn.utils.rnn.pad_sequence(
        [batch[i].waveform for i in permute_indices],
        batch_first=True
    )
    transcript_list = [batch[i].transcript
                       for i in permute_indices]
    # from IPython import embed; embed()
    # transcripts = torch.cat(transcript_list)
    packed_transcripts = torch.nn.utils.rnn.pack_sequence(transcript_list,
                                                          enforce_sorted=False)

    # TODO: Don't I need to stop grad at some point now?
    return (padded_audio_signals, audio_lengths, transcript_list,
            packed_transcripts, transcript_lengths)


class AudioToTextDataLayer:
    """Data layer with data loader
    """

    def __init__(self, **kwargs):
        self._device = torch.device("cuda")

        featurizer_config = kwargs['featurizer_config']
        pad_to_max = kwargs.get('pad_to_max', False)
        perturb_config = kwargs.get('perturb_config', None)
        manifest_filepath = kwargs['manifest_filepath']
        dataset_dir = kwargs['dataset_dir']
        labels = kwargs['labels']
        batch_size = kwargs['batch_size']
        drop_last = kwargs.get('drop_last', False)
        shuffle = kwargs.get('shuffle', True)
        min_duration = featurizer_config.get('min_duration', 0.1)
        max_duration = featurizer_config.get('max_duration', None)
        normalize_transcripts = kwargs.get('normalize_transcripts', True)
        trim_silence = kwargs.get('trim_silence', False)
        sampler_type = kwargs.get('sampler', 'default')
        speed_perturbation = featurizer_config.get('speed_perturbation', False)
        sort_by_duration = sampler_type == 'bucket'
        self._featurizer = WaveformFeaturizer.from_config(
            featurizer_config, perturbation_configs=perturb_config)
        self._dataset = AudioDataset(
            dataset_dir=dataset_dir,
            manifest_filepath=manifest_filepath,
            labels=labels, blank_index=len(labels),
            sort_by_duration=sort_by_duration,
            pad_to_max=pad_to_max,
            featurizer=self._featurizer, max_duration=max_duration,
            min_duration=min_duration, normalize=normalize_transcripts,
            trim=trim_silence, speed_perturbation=speed_perturbation)

        print('sort_by_duration', sort_by_duration)

        self._dataloader = torch.utils.data.DataLoader(
            dataset=self._dataset,
            batch_size=batch_size,
            collate_fn=lambda b: seq_collate_fn(b),
            drop_last=drop_last,
            shuffle=shuffle,
            num_workers=0,
            pin_memory=True,
            sampler=None
        )

    def __len__(self):
        return len(self._dataset)

    @property
    def data_iterator(self):
        return self._dataloader


class AudioDataset(Dataset):
    def __init__(self, dataset_dir, manifest_filepath, labels, featurizer, max_duration=None, pad_to_max=False,
                 min_duration=None, blank_index=0, max_utts=0, normalize=True, sort_by_duration=False,
                 trim=False, speed_perturbation=False):
        """Dataset that loads tensors via a json file containing paths to audio files, transcripts, and durations
        (in seconds). Each entry is a different audio sample.
        Args:
            dataset_dir: absolute path to dataset folder
            manifest_filepath: relative path from dataset folder to manifest json as described above. Can be coma-separated paths.
            labels: String containing all the possible characters to map to
            featurizer: Initialized featurizer class that converts paths of audio to feature tensors
            max_duration: If audio exceeds this length, do not include in dataset
            min_duration: If audio is less than this length, do not include in dataset
            pad_to_max: if specified input sequences into dnn model will be padded to max_duration
            blank_index: blank index for ctc loss / decoder
            max_utts: Limit number of utterances
            normalize: whether to normalize transcript text
            sort_by_duration: whether or not to sort sequences by increasing duration
            trim: if specified trims leading and trailing silence from an audio signal.
            speed_perturbation: specify if using data contains speed perburbation
        """
        m_paths = manifest_filepath.split(',')
        self.manifest = Manifest(dataset_dir, m_paths, labels, blank_index, pad_to_max=pad_to_max,
                                 max_duration=max_duration,
                                 sort_by_duration=sort_by_duration,
                                 min_duration=min_duration, max_utts=max_utts,
                                 normalize=normalize, speed_perturbation=speed_perturbation)
        self.featurizer = featurizer
        self.blank_index = blank_index
        self.trim = trim
        print(
            "Dataset loaded with {0:.2f} hours. Filtered {1:.2f} hours.".format(
                self.manifest.duration / 3600,
                self.manifest.filtered_duration / 3600))

    def __getitem__(self, index):
        sample = self.manifest[index]
        rn_indx = np.random.randint(len(sample['audio_filepath']))
        duration = sample['audio_duration'][rn_indx] if 'audio_duration' in sample else 0
        offset = sample['offset'] if 'offset' in sample else 0
        features = self.featurizer.process(sample['audio_filepath'][rn_indx],
                                           offset=offset, duration=duration,
                                           trim=self.trim)

        AudioSample = namedtuple('AudioSample', ['waveform',
                                                 'transcript'])
        return AudioSample(features,
                           torch.LongTensor(sample["transcript"]))

    def __len__(self):
        return len(self.manifest)

```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/11) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/f9b6221e1be22fc8d053562b1457186c08c80154217215574edfbfa7f7f27180).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by masahi via Apache TVM Discuss <no...@discuss.tvm.ai>.


You are using torch.jit.script. Please try torch.jit.trace.





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/2f095266b20b38375e7f842fb9781e937a86ba967fe207243deb1a7b5789e8ca).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by masahi via Apache TVM Discuss <no...@discuss.tvm.ai>.


Ok, I was able to reproduce the issue. It seems supporting `aten::lstm` is complicated and I'm not an expert on LSTM. I created an issue https://github.com/apache/incubator-tvm/issues/6474 to ask for a help.

For now, I recommend exporting the model to ONNX, and use our ONNX frontend, since it has support for ONNX LSTM op.





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/14) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/845e4b7c5c9fa22cb5ba094e066c72a36519f5971a222a6a93b5b1d417c947a7).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


The model of RNNT is from https://zenodo.org/record/3662521





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/10) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/0334b9589afe4c52e6df109488a31b553d9bdc1371457723291266a4a24d8da1).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


Sorry, I send you the loading model part code into TVM. Our code can not be uploaded in the git repo.
You only run this 'import_rnnt.py' , and it will reproduce the error.
It needs a 'rnnt.toml' which is in the same path.
You need to download the rnnt checkpoint by pytorch. https://zenodo.org/record/3662521

# python import_rnnt.py --ckpt 'path to rnnt.pt'

import_rnnt.py
```
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
import toml
import torch
from typing import Optional, Tuple
import torchvision
import argparse
from tvm import relay
from tqdm import tqdm
import tvm
import numpy as np
import torch

def get_args():
    """Parse commandline."""
    parser = argparse.ArgumentParser()
    parser.add_argument('--ckpt', type=str, required=True, help='The rnnt model path (pytorch ckpt path)')
    parser.add_argument("--model_toml", type=str, default='rnnt.toml',
                        help='relative model configuration path given dataset folder')
    args = parser.parse_args()
    return args


def rnn(rnn, input_size, hidden_size, num_layers, norm=None,
        forget_gate_bias=1.0, dropout=0.0, **kwargs):
    """TODO"""
    if rnn != "lstm":
        raise ValueError(f"Unknown rnn={rnn}")
    if norm not in [None]:
        raise ValueError(f"unknown norm={norm}")

    if rnn == "lstm":
        return LstmDrop(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
            forget_gate_bias=forget_gate_bias,
            **kwargs
        )


class LstmDrop(torch.nn.Module):

    def __init__(self, input_size, hidden_size, num_layers, dropout, forget_gate_bias,
                 **kwargs):
        """Returns an LSTM with forget gate bias init to `forget_gate_bias`.

        Args:
            input_size: See `torch.nn.LSTM`.
            hidden_size: See `torch.nn.LSTM`.
            num_layers: See `torch.nn.LSTM`.
            dropout: See `torch.nn.LSTM`.
            forget_gate_bias: For each layer and each direction, the total value of
                to initialise the forget gate bias to.

        Returns:
            A `torch.nn.LSTM`.
        """
        super(LstmDrop, self).__init__()

        # Interesting, torch LSTM allows specifying number of
        # layers... Fan-out parallelism.
        # WARNING: Is dropout repeated twice?
        self.lstm = torch.nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout,
        )
        if forget_gate_bias is not None:
            for name, v in self.lstm.named_parameters():
                if "bias_ih" in name:
                    bias = getattr(self.lstm, name)
                    bias.data[hidden_size:2 * hidden_size].fill_(forget_gate_bias)
                if "bias_hh" in name:
                    bias = getattr(self.lstm, name)
                    bias.data[hidden_size:2 * hidden_size].fill_(0)

        self.inplace_dropout = (torch.nn.Dropout(dropout, inplace=True)
                                if dropout else None)

    #zzhen developed
    def forward(self, x: torch.Tensor, h: Optional[Tuple[torch.Tensor, torch.Tensor]] = None):
        x, h = self.lstm(x, h)
        if self.inplace_dropout is not None:
            self.inplace_dropout(x.data)

        return x, h


class StackTime(torch.nn.Module):
    def __init__(self, factor):
        super().__init__()
        self.factor = int(factor)

    def forward(self, x, x_lens):
        # T, B, U
        # x, x_lens = x
        seq = [x]
        for i in range(1, self.factor):
            # This doesn't seem to make much sense...
            tmp = torch.zeros_like(x)
            tmp[:-i, :, :] = x[i:, :, :]
            seq.append(tmp)
        x_lens = torch.ceil(x_lens.float() / self.factor).int()
        # Gross, this is horrible. What a waste of memory...
        return torch.cat(seq, dim=2)[::self.factor, :, :], x_lens


class RNNT(torch.nn.Module):
    """A Recurrent Neural Network Transducer (RNN-T).

    Args:
        in_features: Number of input features per step per batch.
        vocab_size: Number of output symbols (inc blank).
        forget_gate_bias: Total initialized value of the bias used in the
            forget gate. Set to None to use PyTorch's default initialisation.
            (See: http://proceedings.mlr.press/v37/jozefowicz15.pdf)
        batch_norm: Use batch normalization in encoder and prediction network
            if true.
        encoder_n_hidden: Internal hidden unit size of the encoder.
        encoder_rnn_layers: Encoder number of layers.
        pred_n_hidden:  Internal hidden unit size of the prediction network.
        pred_rnn_layers: Prediction network number of layers.
        joint_n_hidden: Internal hidden unit size of the joint network.
        rnn_type: string. Type of rnn in SUPPORTED_RNNS.
    """

    def __init__(self, rnnt=None, num_classes=1, **kwargs):
        super().__init__()
        if kwargs.get("no_featurizer", False):
            in_features = kwargs.get("in_features")
        else:
            feat_config = kwargs.get("feature_config")
            # This may be useful in the future, for MLPerf
            # configuration.
            in_features = feat_config['features'] * \
                feat_config.get("frame_splicing", 1)

        self._pred_n_hidden = rnnt['pred_n_hidden']

        self.encoder_n_hidden = rnnt["encoder_n_hidden"]
        self.encoder_pre_rnn_layers = rnnt["encoder_pre_rnn_layers"]
        self.encoder_post_rnn_layers = rnnt["encoder_post_rnn_layers"]

        self.pred_n_hidden = rnnt["pred_n_hidden"]
        self.pred_rnn_layers = rnnt["pred_rnn_layers"]

        self.encoder = Encoder(in_features,
                               rnnt["encoder_n_hidden"],
                               rnnt["encoder_pre_rnn_layers"],
                               rnnt["encoder_post_rnn_layers"],
                               rnnt["forget_gate_bias"],
                               None if "norm" not in rnnt else rnnt["norm"],
                               rnnt["rnn_type"],
                               rnnt["encoder_stack_time_factor"],
                               rnnt["dropout"],
                               )

        self.prediction = self._predict(
            num_classes,
            rnnt["pred_n_hidden"],
            rnnt["pred_rnn_layers"],
            rnnt["forget_gate_bias"],
            None if "norm" not in rnnt else rnnt["norm"],
            rnnt["rnn_type"],
            rnnt["dropout"],
        )

        self.joint_net = self._joint_net(
            num_classes,
            rnnt["pred_n_hidden"],
            rnnt["encoder_n_hidden"],
            rnnt["joint_n_hidden"],
            rnnt["dropout"],
        )

    def _predict(self, vocab_size, pred_n_hidden, pred_rnn_layers,
                 forget_gate_bias, norm, rnn_type, dropout):
        layers = torch.nn.ModuleDict({
            "embed": torch.nn.Embedding(vocab_size - 1, pred_n_hidden),
            "dec_rnn": rnn(
                rnn=rnn_type,
                input_size=pred_n_hidden,
                hidden_size=pred_n_hidden,
                num_layers=pred_rnn_layers,
                norm=norm,
                forget_gate_bias=forget_gate_bias,
                dropout=dropout,
            ),
        })
        return layers

    def _joint_net(self, vocab_size, pred_n_hidden, enc_n_hidden,
                   joint_n_hidden, dropout):
        layers = [
            torch.nn.Linear(pred_n_hidden + enc_n_hidden, joint_n_hidden),
            torch.nn.ReLU(),
        ] + ([torch.nn.Dropout(p=dropout), ] if dropout else []) + [
            torch.nn.Linear(joint_n_hidden, vocab_size)
        ]
        return torch.nn.Sequential(
            *layers
        )

    # Perhaps what I really need to do is provide a value for
    # state. But why can't I just specify a type for abstract
    # intepretation? That's what I really want!
    # We really want two "states" here...
    def forward(self, batch, state=None):
        # batch: ((x, y), (x_lens, y_lens))

        raise RuntimeError(
            "RNNT::forward is not currently used. "
            "It corresponds to training, where your entire output sequence "
            "is known before hand.")

        # x: TxBxF
        (x, y_packed), (x_lens, y_lens) = batch
        x_packed = torch.nn.utils.rnn.pack_padded_sequence(x, x_lens)

        f, x_lens = self.encode(x_packed)

        g, _ = self.predict(y_packed, state)
        out = self.joint(f, g)

        return out, (x_lens, y_lens)

    def predict(self, y, state=None, add_sos=True):
        """
        B - batch size
        U - label length
        H - Hidden dimension size
        L - Number of decoder layers = 2

        Args:
            y: (B, U)

        Returns:
            Tuple (g, hid) where:
                g: (B, U + 1, H)
                hid: (h, c) where h is the final sequence hidden state and c is
                    the final cell state:
                        h (tensor), shape (L, B, H)
                        c (tensor), shape (L, B, H)
        """
        if isinstance(y, torch.Tensor):
            y = self.prediction["embed"](y)
        elif isinstance(y, torch.nn.utils.rnn.PackedSequence):
            # Teacher-forced training mode
            # (B, U) -> (B, U, H)
            y._replace(data=self.prediction["embed"](y.data))
        else:
            # inference mode
            B = 1 if state is None else state[0].size(1)
            y = torch.zeros((B, 1, self.pred_n_hidden)).to(
                device=self.joint_net[0].weight.device,
                dtype=self.joint_net[0].weight.dtype
            )

        # preprend blank "start of sequence" symbol
        if add_sos:
            B, U, H = y.shape
            start = torch.zeros((B, 1, H)).to(device=y.device, dtype=y.dtype)
            y = torch.cat([start, y], dim=1).contiguous()   # (B, U + 1, H)
        else:
            start = None   # makes del call later easier

        y = y.transpose(0, 1)  # .contiguous()   # (U + 1, B, H)
        g, hid = self.prediction["dec_rnn"](y, state)
        g = g.transpose(0, 1)  # .contiguous()   # (B, U + 1, H)
        del y, start, state
        return g, hid


    def joint(self, f, g):
        """
        f should be shape (B, T, H)
        g should be shape (B, U + 1, H)

        returns:
            logits of shape (B, T, U, K + 1)
        """
        # Combine the input states and the output states
        B, T, H = f.shape
        B, U_, H2 = g.shape

        f = f.unsqueeze(dim=2)   # (B, T, 1, H)
        f = f.expand((B, T, U_, H))

        g = g.unsqueeze(dim=1)   # (B, 1, U + 1, H)
        g = g.expand((B, T, U_, H2))

        inp = torch.cat([f, g], dim=3)   # (B, T, U, 2H)
        res = self.joint_net(inp)
        del f, g, inp
        return res



class Encoder(torch.nn.Module):
    def __init__(self, in_features, encoder_n_hidden,
                 encoder_pre_rnn_layers, encoder_post_rnn_layers,
                 forget_gate_bias, norm, rnn_type, encoder_stack_time_factor,
                 dropout):
        super().__init__()
        self.pre_rnn = rnn(
            rnn=rnn_type,
            input_size=in_features,
            hidden_size=encoder_n_hidden,
            num_layers=encoder_pre_rnn_layers,
            norm=norm,
            forget_gate_bias=forget_gate_bias,
            dropout=dropout,
        )
        self.stack_time = StackTime(factor=encoder_stack_time_factor)
        self.post_rnn = rnn(
            rnn=rnn_type,
            input_size=encoder_stack_time_factor * encoder_n_hidden,
            hidden_size=encoder_n_hidden,
            num_layers=encoder_post_rnn_layers,
            norm=norm,
            forget_gate_bias=forget_gate_bias,
            norm_first_rnn=True,
            dropout=dropout,
        )

    def forward(self, x: torch.Tensor, x_lens: torch.Tensor):
        x, _ = self.pre_rnn(x, None)
        x, x_lens = self.stack_time(x, x_lens)
        x, _ = self.post_rnn(x, None)
        x = x.transpose(0, 1)
        return x, x_lens


def get_rnnt_model(featurizer_config, model_definition, ctc_vocab, ckpt):
    model = RNNT(
        feature_config=featurizer_config,
        rnnt=model_definition['rnnt'],
        num_classes=len(ctc_vocab)
    )
    checkpoint = torch.load(ckpt, map_location="cpu")
    model.load_state_dict(checkpoint['state_dict'], strict=False)
    model.eval()
    return model

def rnnt_model_to_tvm_mod(model):
    input_shape = (316, 1, 240)
    len_shape = (316)
    t_audio_signal_e = torch.randn(input_shape)
    t_a_sig_length_e = torch.randn(len_shape)
    model.encoder = torch.jit.trace(model.encoder, (t_audio_signal_e, t_a_sig_length_e)).eval()

    mod, params = relay.frontend.from_pytorch(model.encoder, input_shapes=None)
    mod = relay.transform.RemoveUnusedFunctions()(mod)
    return mod, params

def add_blank_label(labels):
    if not isinstance(labels, list):
        raise ValueError("labels must be a list of symbols")
    labels.append("<BLANK>")
    return labels


def main():
    args = get_args()
    model_definition = toml.load(args.model_toml)
    dataset_vocab = model_definition['labels']['labels']
    ctc_vocab = add_blank_label(dataset_vocab)
    featurizer_config = model_definition['input_eval']

    rnnt_model = get_rnnt_model(featurizer_config, model_definition, ctc_vocab, args.ckpt)
    mod, params = rnnt_model_to_tvm_mod(rnnt_model)

if __name__ == '__main__':
    main()

```

rnnt.toml
```
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019, Myrtle Software Limited. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

model = "RNNT"

[input]
normalize = "per_feature"
sample_rate = 16000
window_size = 0.02
window_stride = 0.01
window = "hann"
features = 80
n_fft = 512
frame_splicing = 3
dither = 0.00001
feat_type = "logfbank"
normalize_transcripts = true
trim_silence = true
pad_to = 0   # TODO
max_duration = 16.7
speed_perturbation = true


cutout_rect_regions = 0
cutout_rect_time = 60
cutout_rect_freq = 25


cutout_x_regions = 2
cutout_y_regions = 2
cutout_x_width = 6
cutout_y_width = 6


[input_eval]
normalize = "per_feature"
sample_rate = 16000
window_size = 0.02
window_stride = 0.01
window = "hann"
features = 80
n_fft = 512
frame_splicing = 3
dither = 0.00001
feat_type = "logfbank"
normalize_transcripts = true
trim_silence = true
pad_to = 0


[rnnt]
rnn_type = "lstm"
encoder_n_hidden = 1024
encoder_pre_rnn_layers = 2
encoder_stack_time_factor = 2
encoder_post_rnn_layers = 3
pred_n_hidden = 320
pred_rnn_layers = 2
forget_gate_bias = 1.0
joint_n_hidden = 512
dropout=0.32


[labels]
labels = [" ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]

```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/13) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/8554c5f97397b6b15e300099bf1024663b5f8a80974bc92f4b4a78b63e15687c).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by masahi via Apache TVM Discuss <no...@discuss.tvm.ai>.


sorry can you make a git repo with all necessary files?





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/12) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/e9e502871fe2e7348d580326330cfa1ae2c6c4e79423580b50bd4f098c22d804).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


Also, I use torch.jit.trace. But, There are ‘aten::lstm', 'aten::copy_' are not supported.
```
Traceback (most recent call last):
  File "/Automation/zzhen/pycharm-community-2019.3.3/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Automation/zzhen/pycharm-community-2019.3.3/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/root/test6_root/zzhen/2020/SamsungGit/0911/quantization/tutorials/quantization/rnnt/quantize_rnnt.py", line 257, in <module>
    main()
  File "/root/test6_root/zzhen/2020/SamsungGit/0911/quantization/tutorials/quantization/rnnt/quantize_rnnt.py", line 244, in main
    mod, params = rnnt_model_to_tvm_mod(rnnt_model)
  File "/root/test6_root/zzhen/2020/SamsungGit/0911/quantization/tutorials/quantization/rnnt/quantize_rnnt.py", line 164, in rnnt_model_to_tvm_mod
    mod, params = relay.frontend.from_pytorch(model.encoder, input_shapes=None)
  File "/root/test6_root/zzhen/2020/SamsungGit/0911/quantization/python/tvm/relay/frontend/pytorch.py", line 2547, in from_pytorch
    _report_missing_conversion(op_names, convert_map)
  File "/root/test6_root/zzhen/2020/SamsungGit/0911/quantization/python/tvm/relay/frontend/pytorch.py", line 2060, in _report_missing_conversion
    raise NotImplementedError(msg)
NotImplementedError: The following operators are not implemented: ['aten::lstm', 'aten::copy_']**strong text**
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/6360770195332a56bd58873f62356fac800bff3fc5fc4a6a7acc5221be446773).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


decoders.py
```
# Copyright (c) 2019, Myrtle Software Limited. All rights reserved.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch

import torch.nn.functional as F
from model_rnnt import label_collate


class TransducerDecoder:
    """Decoder base class.

    Args:
        alphabet: An Alphabet object.
        blank_symbol: The symbol in `alphabet` to use as the blank during CTC
            decoding.
        model: Model to use for prediction.
    """

    def __init__(self, blank_index, model):
        self._model = model
        self._SOS = -1   # start of sequence
        self._blank_id = blank_index

    def _pred_step(self, label, hidden, device):
        if label == self._SOS:
            return self._model.predict(None, hidden, add_sos=False)
            # return self._model.prediction(None, hidden, add_sos=False)
        if label > self._blank_id:
            label -= 1
        label = label_collate([[label]]).to(device)
        return self._model.predict(label, hidden, add_sos=False)
        # return self._model.prediction(label, hidden, add_sos=False)

    def _joint_step(self, enc, pred, log_normalize=False):
        logits = self._model.joint(enc, pred)[:, 0, 0, :]
        if not log_normalize:
            return logits

        probs = F.log_softmax(logits, dim=len(logits.shape) - 1)

        return probs

    def _get_last_symb(self, labels):
        return self._SOS if labels == [] else labels[-1]
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/8) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/3ecbf060fc819c913eb41dfec0ca50cf8ba926dfb61042a67c3c8a45edbd93e9).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by zhangzhen507 via Apache TVM Discuss <no...@discuss.tvm.ai>.


model_rnnt.py
```
# Copyright (c) 2019, Myrtle Software Limited. All rights reserved.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#           http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import numpy as np
import torch

from rnn import rnn
from rnn import StackTime


class RNNT(torch.nn.Module):
    """A Recurrent Neural Network Transducer (RNN-T).

    Args:
        in_features: Number of input features per step per batch.
        vocab_size: Number of output symbols (inc blank).
        forget_gate_bias: Total initialized value of the bias used in the
            forget gate. Set to None to use PyTorch's default initialisation.
            (See: http://proceedings.mlr.press/v37/jozefowicz15.pdf)
        batch_norm: Use batch normalization in encoder and prediction network
            if true.
        encoder_n_hidden: Internal hidden unit size of the encoder.
        encoder_rnn_layers: Encoder number of layers.
        pred_n_hidden:  Internal hidden unit size of the prediction network.
        pred_rnn_layers: Prediction network number of layers.
        joint_n_hidden: Internal hidden unit size of the joint network.
        rnn_type: string. Type of rnn in SUPPORTED_RNNS.
    """

    def __init__(self, rnnt=None, num_classes=1, **kwargs):
        super().__init__()
        if kwargs.get("no_featurizer", False):
            in_features = kwargs.get("in_features")
        else:
            feat_config = kwargs.get("feature_config")
            # This may be useful in the future, for MLPerf
            # configuration.
            in_features = feat_config['features'] * \
                feat_config.get("frame_splicing", 1)

        self._pred_n_hidden = rnnt['pred_n_hidden']

        self.encoder_n_hidden = rnnt["encoder_n_hidden"]
        self.encoder_pre_rnn_layers = rnnt["encoder_pre_rnn_layers"]
        self.encoder_post_rnn_layers = rnnt["encoder_post_rnn_layers"]

        self.pred_n_hidden = rnnt["pred_n_hidden"]
        self.pred_rnn_layers = rnnt["pred_rnn_layers"]

        
        self.encoder = Encoder(in_features,
                               rnnt["encoder_n_hidden"],
                               rnnt["encoder_pre_rnn_layers"],
                               rnnt["encoder_post_rnn_layers"],
                               rnnt["forget_gate_bias"],
                               None if "norm" not in rnnt else rnnt["norm"],
                               rnnt["rnn_type"],
                               rnnt["encoder_stack_time_factor"],
                               rnnt["dropout"],
                               )

        self.prediction = self._predict(
            num_classes,
            rnnt["pred_n_hidden"],
            rnnt["pred_rnn_layers"],
            rnnt["forget_gate_bias"],
            None if "norm" not in rnnt else rnnt["norm"],
            rnnt["rnn_type"],
            rnnt["dropout"],
        )
       
        self.joint_net = self._joint_net(
            num_classes,
            rnnt["pred_n_hidden"],
            rnnt["encoder_n_hidden"],
            rnnt["joint_n_hidden"],
            rnnt["dropout"],
        )

   
    def _predict(self, vocab_size, pred_n_hidden, pred_rnn_layers,
                 forget_gate_bias, norm, rnn_type, dropout):
        layers = torch.nn.ModuleDict({
            "embed": torch.nn.Embedding(vocab_size - 1, pred_n_hidden),
            "dec_rnn": rnn(
                rnn=rnn_type,
                input_size=pred_n_hidden,
                hidden_size=pred_n_hidden,
                num_layers=pred_rnn_layers,
                norm=norm,
                forget_gate_bias=forget_gate_bias,
                dropout=dropout,
            ),
        })
        return layers

    def _joint_net(self, vocab_size, pred_n_hidden, enc_n_hidden,
                   joint_n_hidden, dropout):
        layers = [
            torch.nn.Linear(pred_n_hidden + enc_n_hidden, joint_n_hidden),
            torch.nn.ReLU(),
        ] + ([torch.nn.Dropout(p=dropout), ] if dropout else []) + [
            torch.nn.Linear(joint_n_hidden, vocab_size)
        ]
        return torch.nn.Sequential(
            *layers
        )

    # Perhaps what I really need to do is provide a value for
    # state. But why can't I just specify a type for abstract
    # intepretation? That's what I really want!
    # We really want two "states" here...
    def forward(self, batch, state=None):
        # batch: ((x, y), (x_lens, y_lens))

        raise RuntimeError(
            "RNNT::forward is not currently used. "
            "It corresponds to training, where your entire output sequence "
            "is known before hand.")

        # x: TxBxF
        (x, y_packed), (x_lens, y_lens) = batch
        x_packed = torch.nn.utils.rnn.pack_padded_sequence(x, x_lens)

        f, x_lens = self.encode(x_packed)

        g, _ = self.predict(y_packed, state)
        out = self.joint(f, g)

        return out, (x_lens, y_lens)


    def predict(self, y, state=None, add_sos=True):
        """
        B - batch size
        U - label length
        H - Hidden dimension size
        L - Number of decoder layers = 2

        Args:
            y: (B, U)

        Returns:
            Tuple (g, hid) where:
                g: (B, U + 1, H)
                hid: (h, c) where h is the final sequence hidden state and c is
                    the final cell state:
                        h (tensor), shape (L, B, H)
                        c (tensor), shape (L, B, H)
        """
        if isinstance(y, torch.Tensor):
            y = self.prediction["embed"](y)
        elif isinstance(y, torch.nn.utils.rnn.PackedSequence):
            # Teacher-forced training mode
            # (B, U) -> (B, U, H)
            y._replace(data=self.prediction["embed"](y.data))
        else:
            # inference mode
            B = 1 if state is None else state[0].size(1)
            y = torch.zeros((B, 1, self.pred_n_hidden)).to(
                device=self.joint_net[0].weight.device,
                dtype=self.joint_net[0].weight.dtype
            )

        # preprend blank "start of sequence" symbol
        if add_sos:
            B, U, H = y.shape
            start = torch.zeros((B, 1, H)).to(device=y.device, dtype=y.dtype)
            y = torch.cat([start, y], dim=1).contiguous()   # (B, U + 1, H)
        else:
            start = None   # makes del call later easier



        y = y.transpose(0, 1)  # .contiguous()   # (U + 1, B, H)
        g, hid = self.prediction["dec_rnn"](y, state)
        g = g.transpose(0, 1)  # .contiguous()   # (B, U + 1, H)
        del y, start, state
        return g, hid


    def joint(self, f, g):
        """
        f should be shape (B, T, H)
        g should be shape (B, U + 1, H)

        returns:
            logits of shape (B, T, U, K + 1)
        """
        # Combine the input states and the output states
        B, T, H = f.shape
        B, U_, H2 = g.shape

        f = f.unsqueeze(dim=2)   # (B, T, 1, H)
        f = f.expand((B, T, U_, H))

        g = g.unsqueeze(dim=1)   # (B, 1, U + 1, H)
        g = g.expand((B, T, U_, H2))

        inp = torch.cat([f, g], dim=3)   # (B, T, U, 2H)
        res = self.joint_net(inp)
        del f, g, inp
        return res



class Encoder(torch.nn.Module):
    def __init__(self, in_features, encoder_n_hidden,
                 encoder_pre_rnn_layers, encoder_post_rnn_layers,
                 forget_gate_bias, norm, rnn_type, encoder_stack_time_factor,
                 dropout):
        super().__init__()
        self.pre_rnn = rnn(
            rnn=rnn_type,
            input_size=in_features,
            hidden_size=encoder_n_hidden,
            num_layers=encoder_pre_rnn_layers,
            norm=norm,
            forget_gate_bias=forget_gate_bias,
            dropout=dropout,
        )
        self.stack_time = StackTime(factor=encoder_stack_time_factor)
        self.post_rnn = rnn(
            rnn=rnn_type,
            input_size=encoder_stack_time_factor * encoder_n_hidden,
            hidden_size=encoder_n_hidden,
            num_layers=encoder_post_rnn_layers,
            norm=norm,
            forget_gate_bias=forget_gate_bias,
            norm_first_rnn=True,
            dropout=dropout,
        )

    def forward(self, x: torch.Tensor, x_lens: torch.Tensor):
        x, _ = self.pre_rnn(x, None)
        x, x_lens = self.stack_time(x, x_lens)
        x, _ = self.post_rnn(x, None)
        x = x.transpose(0, 1)
        return x, x_lens






def label_collate(labels):
    """Collates the label inputs for the rnn-t prediction network.

    If `labels` is already in torch.Tensor form this is a no-op.

    Args:
        labels: A torch.Tensor List of label indexes or a torch.Tensor.

    Returns:
        A padded torch.Tensor of shape (batch, max_seq_len).
    """

    if isinstance(labels, torch.Tensor):
        return labels.type(torch.int64)
    if not isinstance(labels, (list, tuple)):
        raise ValueError(
            f"`labels` should be a list or tensor not {type(labels)}"
        )

    batch_size = len(labels)
    max_len = max(len(l) for l in labels)

    cat_labels = np.full((batch_size, max_len), fill_value=0.0, dtype=np.int32)
    for e, l in enumerate(labels):
        cat_labels[e, :len(l)] = l
    labels = torch.LongTensor(cat_labels)

    return labels
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/7744e63c8ab2438b276e808229cef758a83d3010b48b0980fa2df8b3fea85150).

[Apache TVM Discuss] [Questions] Import RNN-T pytorch model into TVM

Posted by masahi via Apache TVM Discuss <no...@discuss.tvm.ai>.


can you show me your script so that I can reproduce your problem?





---
[Visit Topic](https://discuss.tvm.apache.org/t/import-rnn-t-pytorch-model-into-tvm/7874/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/05cb2024668508625102ba82b59ee118c91a85ed03e5679cde34ccc3bae53bc9).