You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by jx...@apache.org on 2017/12/18 18:36:42 UTC
[incubator-mxnet] branch master updated: Remove defunct demo (#9060)
This is an automated email from the ASF dual-hosted git repository.
jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git
The following commit(s) were added to refs/heads/master by this push:
new 5858d62 Remove defunct demo (#9060)
5858d62 is described below
commit 5858d6255d8ca83cc1c264cd94a3facc8d611718
Author: Simon <11...@users.noreply.github.com>
AuthorDate: Mon Dec 18 10:36:38 2017 -0800
Remove defunct demo (#9060)
* Update for MXNet 1.0. PEP8 fixes to code and misc improvements. Tested under Python 2.7 and Python 3.6 on Sagemaker.
* Remove defunct tutorial page
* Remove defunct demo
* Remove duplicate material
---
docs/tutorials/nlp/nce_loss.md | 38 --
example/speech-demo/README.md | 160 -----
example/speech-demo/config_util.py | 81 ---
example/speech-demo/decode_mxnet.py | 177 ------
example/speech-demo/decode_mxnet.sh | 112 ----
example/speech-demo/default.cfg | 50 --
example/speech-demo/default_timit.cfg | 52 --
example/speech-demo/io_func/__init__.py | 0
example/speech-demo/io_func/convert2kaldi.py | 130 ----
example/speech-demo/io_func/feat_io.py | 412 -------------
.../speech-demo/io_func/feat_readers/__init__.py | 0
example/speech-demo/io_func/feat_readers/common.py | 75 ---
.../io_func/feat_readers/reader_atrack.py | 66 --
.../io_func/feat_readers/reader_bvec.py | 47 --
.../speech-demo/io_func/feat_readers/reader_htk.py | 54 --
.../io_func/feat_readers/reader_kaldi.py | 154 -----
example/speech-demo/io_func/feat_readers/stats.py | 162 -----
.../io_func/feat_readers/writer_kaldi.py | 74 ---
example/speech-demo/io_func/info.py | 23 -
example/speech-demo/io_func/kaldi_parser.py | 219 -------
example/speech-demo/io_func/model_io.py | 275 ---------
example/speech-demo/io_func/regr_feat_io.py | 92 ---
example/speech-demo/io_func/utils.py | 170 ------
example/speech-demo/io_util.py | 671 ---------------------
example/speech-demo/lstm_proj.py | 153 -----
example/speech-demo/make_stats.py | 103 ----
example/speech-demo/python_wrap/Makefile | 13 -
example/speech-demo/python_wrap/ctypes.cc | 244 --------
.../python_wrap/example_usage/README.txt | 14 -
.../speech-demo/python_wrap/example_usage/data.ark | Bin 41 -> 0 bytes
.../speech-demo/python_wrap/example_usage/data.scp | 1 -
.../speech-demo/python_wrap/example_usage/data.txt | 3 -
.../python_wrap/example_usage/example.py | 111 ----
example/speech-demo/run_ami.sh | 141 -----
example/speech-demo/run_timit.sh | 141 -----
example/speech-demo/speechSGD.py | 127 ----
example/speech-demo/tests/test_nothing.py | 19 -
example/speech-demo/tests/test_system.py | 109 ----
example/speech-demo/train_lstm_proj.py | 327 ----------
39 files changed, 4800 deletions(-)
diff --git a/docs/tutorials/nlp/nce_loss.md b/docs/tutorials/nlp/nce_loss.md
deleted file mode 100644
index 564b9e8..0000000
--- a/docs/tutorials/nlp/nce_loss.md
+++ /dev/null
@@ -1,38 +0,0 @@
-# NCE Loss
-This tutorial shows how to use nce-loss to speed up multi-class classification when the number of classes is huge.
-
-You can get the source code for this example on [GitHub](https://github.com/dmlc/mxnet/tree/master/example/nce-loss).
-
-## Toy Examples
-
-* toy_softmax.py. A multi class example using softmax output
-* toy_nce.py. A multi-class example using nce loss
-
-### Word2Vec
-
-* word2vec.py. A CBOW word2vec example using nce loss
-
-Run word2vec.py with the following command:
-
-```
- ./get_text8.sh
- python word2vec.py
-```
-
-### LSTM
-
-* lstm_word.py. An LSTM example using nce loss
-
-Run lstm_word.py with the following command:
-
-```
- ./get_text8.sh
- python lstm_word.py
-```
-
-## References
-
-For more details, see [http://www.jianshu.com/p/e439b43ea464](http://www.jianshu.com/p/e439b43ea464) (in Chinese).
-
-## Next Steps
-* [MXNet tutorials index](http://mxnet.io/tutorials/index.html)
\ No newline at end of file
diff --git a/example/speech-demo/README.md b/example/speech-demo/README.md
deleted file mode 100644
index 00b0b64..0000000
--- a/example/speech-demo/README.md
+++ /dev/null
@@ -1,160 +0,0 @@
-Speech Acoustic Modeling Example
-================================
-This folder contains examples for speech recognition.
-
-- [lstm_proj.py](lstm.py): Functions for building a LSTM Network with/without projection layer.
-- [io_util.py](io_util.py): Wrapper functions for `DataIter` over speech data.
-- [train_lstm_proj.py](train_lstm_proj.py): Script for training LSTM acoustic model.
-- [decode_mxnet.py](decode_mxnet.py): Script for decoding LSTMP acoustic model.
-- [default.cfg](default.cfg): Configuration for training on the `AMI` SDM1 dataset. Can be used as a template for writing other configuration files.
-- [python_wrap](python_wrap): C wrappers for Kaldi C++ code, this is built into a .so. Python code that loads the .so and calls the C wrapper functions in `io_func/feat_readers/reader_kaldi.py`.
-
-Connect to Kaldi:
-- [decode_mxnet.sh](decode_mxnet.sh): called by Kaldi to decode a acoustic model trained by mxnet (please select the `simple` method for decoding).
-
-A full receipt:
-- [run_ami.sh](run_ami.sh): a full receipt to train and decode acoustic model on AMI. It takes features and alignment from Kaldi to train an acoustic model and decode it.
-
-To reproduce the results, use the following steps.
-
-### Build Kaldi
-
-Build Kaldi as **shared libraties** if you have not already done so.
-
-```bash
-cd kaldi/src
-./configure --shared # and other options that you need
-make depend
-make
-```
-
-### Build Python Wrapper
-
-1. Copy or link the attached `python_wrap` folder to `kaldi/src`.
-2. Compile python_wrap/
-
-```
-cd kaldi/src/python_wrap/
-make
-```
-
-### Extract Features and Prepare Frame-level Labels
-
-The acoustic models use *Mel filter-bank* or *MFCC* as input features. It also need to use Kaldi to do force-alignment to generate frame-level labels from the text transcriptions. For example, if you want to work on the `AMI` data `SDM1`. You can run `kaldi/egs/ami/s5/run_sdm.sh`. You will need to do some configuration of paths in `kaldi/egs/ami/s5/cmd.sh` and `kaldi/egs/ami/s5/run_sdm.sh` before you can run the examples. Please refer to Kaldi's document for more details.
-
-The default `run_sdm.sh` script generates the force-alignment labels in their stage 7, and saves the force-aligned labels in `exp/sdm1/tri3a_ali`. The default script generates MFCC features (13-dimensional). You can try training with the MFCC features, or you can create Mel filter bank features by your self. For example, a script like this can be used to compute Mel filter bank features using Kaldi.
-
-```bash
-#!/bin/bash -u
-
-. ./cmd.sh
-. ./path.sh
-
-# SDM - Signle Distant Microphone
-micid=1 #which mic from array should be used?
-mic=sdm$micid
-
-# Set bash to 'debug' mode, it prints the commands (option '-x') and exits on :
-# -e 'error', -u 'undefined variable', -o pipefail 'error in pipeline',
-set -euxo pipefail
-
-# Path where AMI gets downloaded (or where locally available):
-AMI_DIR=$PWD/wav_db # Default,
-data_dir=$PWD/data/$mic
-
-# make filter bank data
-for dset in train dev eval; do
- steps/make_fbank.sh --nj 48 --cmd "$train_cmd" $data_dir/$dset \
- $data_dir/$dset/log $data_dir/$dset/data-fbank
- steps/compute_cmvn_stats.sh $data_dir/$dset \
- $data_dir/$dset/log $data_dir/$dset/data
-
- apply-cmvn --utt2spk=ark:$data_dir/$dset/utt2spk \
- scp:$data_dir/$dset/cmvn.scp scp:$data_dir/$dset/feats.scp \
- ark,scp:$data_dir/$dset/feats-cmvn.ark,$data_dir/$dset/feats-cmvn.scp
-
- mv $data_dir/$dset/feats-cmvn.scp $data_dir/$dset/feats.scp
-done
-```
-Here `apply-cmvn` was for mean-variance normalization. The default setup was applied per speaker. A more common was doing mean-variance normalization for the whole corpus and then feed to the neural networks:
-```
- compute-cmvn-stats scp:data/sdm1/train_fbank/feats.scp data/sdm1/train_fbank/cmvn_g.ark
- apply-cmvn --norm-vars=true data/sdm1/train_fbank/cmvn_g.ark scp:data/sdm1/train_fbank/feats.scp ark,scp:data/sdm1/train_fbank_gcmvn/feats.ark,data/sdm1/train_fbank_gcmvn/feats.scp
-```
-Note that kaldi always try to find features in `feats.scp`. So make sure the normalized features organized as Kaldi way during decoding.
-
-Finally, you need to put the features and labels together in a file so that MXNet can find them. More specifically, for each data set (train, dev, eval), you will need to create a file like `train_mxnet.feats`, will the following contents:
-
-```
-TRANSFORM scp:feat.scp
-scp:label.scp
-```
-
-Here the `TRANSFORM` is the transformation you want to apply to the features. By default we use `NO_FEATURE_TRANSFORM`. The `scp:` syntax is from Kaldi. The `feat.scp` is typically the file from `data/sdm1/train/feats.scp`, and the `label.scp` is converted from the force-aligned labels located in `exp/sdm1/tri3a_ali`. Because the force-alignments are only generated on the training data, we split the training set into 90/10 parts, and use the 1/10 hold-out as the dev set (validation set). [...]
-
-### Run MXNet Acoustic Model Training
-
-1. Go back to this speech demo directory in MXNet. Make a copy of `default.cfg` and edit necessary items like the path to the dataset you just prepared.
-2. Run `python train_lstm.py --configfile=your-config.cfg`. You can do `python train_lstm.py --help` to see the helps. All the configuration parameters can be set in `default.cfg`, customized config file, and through command line (e.g. `--train_batch_size=50`), and the latter values overwrite the former ones.
-
-Here are some example outputs that we got from training on the TIMIT dataset.
-
-```
-Example output for TIMIT:
-Summary of dataset ==================
-bucket of len 100 : 3 samples
-bucket of len 200 : 346 samples
-bucket of len 300 : 1496 samples
-bucket of len 400 : 974 samples
-bucket of len 500 : 420 samples
-bucket of len 600 : 90 samples
-bucket of len 700 : 11 samples
-bucket of len 800 : 2 samples
-Summary of dataset ==================
-bucket of len 100 : 0 samples
-bucket of len 200 : 28 samples
-bucket of len 300 : 169 samples
-bucket of len 400 : 107 samples
-bucket of len 500 : 41 samples
-bucket of len 600 : 6 samples
-bucket of len 700 : 3 samples
-bucket of len 800 : 0 samples
-2016-04-21 20:02:40,904 Epoch[0] Train-Acc_exlude_padding=0.154763
-2016-04-21 20:02:40,904 Epoch[0] Time cost=91.574
-2016-04-21 20:02:44,419 Epoch[0] Validation-Acc_exlude_padding=0.353552
-2016-04-21 20:04:17,290 Epoch[1] Train-Acc_exlude_padding=0.447318
-2016-04-21 20:04:17,290 Epoch[1] Time cost=92.870
-2016-04-21 20:04:20,738 Epoch[1] Validation-Acc_exlude_padding=0.506458
-2016-04-21 20:05:53,127 Epoch[2] Train-Acc_exlude_padding=0.557543
-2016-04-21 20:05:53,128 Epoch[2] Time cost=92.390
-2016-04-21 20:05:56,568 Epoch[2] Validation-Acc_exlude_padding=0.548100
-```
-
-The final frame accuracy was around 62%.
-
-### Run decode on the trained acoustic model
-
-1. Estimate senone priors by run `python make_stats.py --configfile=your-config.cfg | copy-feats ark:- ark:label_mean.ark` (edit necessary items like the path to the training dataset). It will generate the label counts in `label_mean.ark`.
-2. Link to necessary Kaldi decode setup e.g. `local/` and `utils/` and Run `./run_ami.sh --model prefix model --num_epoch num`.
-
-Here are the results on TIMIT and AMI test set (using all default setup, 3 layer LSTM with projection layers):
-
-| Corpus | WER |
-|--------|-----|
-|TIMIT | 18.9|
-|AMI | 51.7 (42.2) |
-
-Note that for AMI 42.2 was evaluated non-overlapped speech. Kaldi-HMM baseline was 67.2% and DNN was 57.5%.
-
-### update Feb 07
-
-We had updated this demo on Feb 07 (kaldi c747ed5, mxnet 912a7eb). We had also added timit demo script in this folder.
-
-To run the timit demo:
-
-1. cd path/to/kaldi/egs/timit/s5/
-2. ./run.sh (setup the kaild timit demo and run it)
-3. ln -s path/to/mxnet/example/speech-demo/* path/to/kaldi/egs/timit/s5/
-4. set **ali_src, graph_src** and so on in the run_timit.sh and default_timit.cfg to the generated folder in kaldi/egs/timit/s5/exp. In the demo script, we use tri3_ali as the alignment dir
-5. set ydim (in default_timit.cfg) to kaldi/egs/timit/s5/exp/tri3/graph/num_pdfs + 1
-6. ./run_timit.sh
diff --git a/example/speech-demo/config_util.py b/example/speech-demo/config_util.py
deleted file mode 100644
index 6fd6a50..0000000
--- a/example/speech-demo/config_util.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import re
-import os
-import sys
-import argparse
-import mxnet as mx
-import numpy as np
-
-if sys.version_info >= (3, 0):
- import configparser
-else:
- import ConfigParser as configparser
-
-
-def parse_args():
- default_cfg = configparser.ConfigParser()
- default_cfg.read(os.path.join(os.path.dirname(__file__), 'default.cfg'))
-
- parser = argparse.ArgumentParser()
- parser.add_argument("--configfile", help="config file for training parameters")
-
- # those allow us to overwrite the configs through command line
- for sec in default_cfg.sections():
- for name, _ in default_cfg.items(sec):
- arg_name = '--%s_%s' % (sec, name)
- doc = 'Overwrite %s in section [%s] of config file' % (name, sec)
- parser.add_argument(arg_name, help=doc)
-
- args = parser.parse_args()
-
- if args.configfile is not None:
- # now read the user supplied config file to overwrite some values
- default_cfg.read(args.configfile)
-
- # now overwrite config from command line options
- for sec in default_cfg.sections():
- for name, _ in default_cfg.items(sec):
- arg_name = ('%s_%s' % (sec, name)).replace('-', '_')
- if hasattr(args, arg_name) and getattr(args, arg_name) is not None:
- sys.stderr.write('!! CMDLine overwriting %s.%s:\n' % (sec, name))
- sys.stderr.write(" '%s' => '%s'\n" % (default_cfg.get(sec, name),
- getattr(args, arg_name)))
- default_cfg.set(sec, name, getattr(args, arg_name))
-
- args.config = default_cfg
- sys.stderr.write("="*80+"\n")
- return args
-
-
-def get_checkpoint_path(args):
- prefix = args.config.get('train', 'prefix')
- if os.path.isabs(prefix):
- return prefix
- return os.path.abspath(os.path.join(os.path.dirname(__file__), 'checkpoints', prefix))
-
-
-def parse_contexts(args):
- # parse context into Context objects
- contexts = re.split(r'\W+', args.config.get('train', 'context'))
- for i, ctx in enumerate(contexts):
- if ctx[:3] == 'gpu':
- contexts[i] = mx.context.gpu(int(ctx[3:]))
- else:
- contexts[i] = mx.context.cpu(int(ctx[3:]))
- return contexts
diff --git a/example/speech-demo/decode_mxnet.py b/example/speech-demo/decode_mxnet.py
deleted file mode 100644
index deb9c30..0000000
--- a/example/speech-demo/decode_mxnet.py
+++ /dev/null
@@ -1,177 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import re
-import sys
-sys.path.insert(0, "../../python")
-import time
-import logging
-import os.path
-
-import mxnet as mx
-import numpy as np
-
-from lstm_proj import lstm_unroll
-from io_util import BucketSentenceIter, TruncatedSentenceIter, SimpleIter, DataReadStream
-from config_util import parse_args, get_checkpoint_path, parse_contexts
-
-from io_func.feat_readers.writer_kaldi import KaldiWriteOut
-
-# some constants
-METHOD_BUCKETING = 'bucketing'
-METHOD_TBPTT = 'truncated-bptt'
-METHOD_SIMPLE = 'simple'
-
-def prepare_data(args):
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_hidden_proj = args.config.getint('arch', 'num_hidden_proj')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
-
- init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
- if num_hidden_proj > 0:
- init_h = [('l%d_init_h'%l, (batch_size, num_hidden_proj)) for l in range(num_lstm_layer)]
- else:
- init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-
- init_states = init_c + init_h
-
- file_test = args.config.get('data', 'test')
- file_label_mean = args.config.get('data', 'label_mean')
- file_format = args.config.get('data', 'format')
- feat_dim = args.config.getint('data', 'xdim')
- label_dim = args.config.getint('data', 'ydim')
-
- test_data_args = {
- "gpu_chunk": 32768,
- "lst_file": file_test,
- "file_format": file_format,
- "separate_lines":True,
- "has_labels":False
- }
-
- label_mean_args = {
- "gpu_chunk": 32768,
- "lst_file": file_label_mean,
- "file_format": file_format,
- "separate_lines":True,
- "has_labels":False
- }
-
- test_sets = DataReadStream(test_data_args, feat_dim)
- label_mean_sets = DataReadStream(label_mean_args, label_dim)
- return (init_states, test_sets, label_mean_sets)
-
-
-if __name__ == '__main__':
- args = parse_args()
- args.config.write(sys.stderr)
-
- decoding_method = args.config.get('train', 'method')
- contexts = parse_contexts(args)
-
- init_states, test_sets, label_mean_sets = prepare_data(args)
- state_names = [x[0] for x in init_states]
-
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_hidden_proj = args.config.getint('arch', 'num_hidden_proj')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
- feat_dim = args.config.getint('data', 'xdim')
- label_dim = args.config.getint('data', 'ydim')
- out_file = args.config.get('data', 'out_file')
- num_epoch = args.config.getint('train', 'num_epoch')
- model_name = get_checkpoint_path(args)
- logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(message)s')
-
- # load the model
- sym, arg_params, aux_params = mx.model.load_checkpoint(model_name, num_epoch)
-
- if decoding_method == METHOD_BUCKETING:
- buckets = args.config.get('train', 'buckets')
- buckets = list(map(int, re.split(r'\W+', buckets)))
- data_test = BucketSentenceIter(test_sets, buckets, batch_size, init_states, feat_dim=feat_dim, has_label=False)
- def sym_gen(seq_len):
- sym = lstm_unroll(num_lstm_layer, seq_len, feat_dim, num_hidden=num_hidden,
- num_label=label_dim, take_softmax=True, num_hidden_proj=num_hidden_proj)
- data_names = ['data'] + state_names
- label_names = ['softmax_label']
- return (sym, data_names, label_names)
-
- module = mx.mod.BucketingModule(sym_gen,
- default_bucket_key=data_test.default_bucket_key,
- context=contexts)
- elif decoding_method == METHOD_SIMPLE:
- data_test = SimpleIter(test_sets, batch_size, init_states, feat_dim=feat_dim, label_dim=label_dim,
- label_mean_sets=label_mean_sets, has_label=False)
- def sym_gen(seq_len):
- sym = lstm_unroll(num_lstm_layer, seq_len, feat_dim, num_hidden=num_hidden,
- num_label=label_dim, take_softmax=False, num_hidden_proj=num_hidden_proj)
- data_names = ['data'] + state_names
- label_names = []
- return (sym, data_names, label_names)
-
- module = mx.mod.BucketingModule(sym_gen,
- default_bucket_key=data_test.default_bucket_key,
- context=contexts)
-
- else:
- truncate_len=20
- data_test = TruncatedSentenceIter(test_sets, batch_size, init_states,
- truncate_len, feat_dim=feat_dim,
- do_shuffling=False, pad_zeros=True, has_label=True)
-
- sym = lstm_unroll(num_lstm_layer, truncate_len, feat_dim, num_hidden=num_hidden,
- num_label=label_dim, output_states=True, num_hidden_proj=num_hidden_proj)
- data_names = [x[0] for x in data_test.provide_data]
- label_names = ['softmax_label']
- module = mx.mod.Module(sym, context=contexts, data_names=data_names,
- label_names=label_names)
- # set the parameters
- module.bind(data_shapes=data_test.provide_data, label_shapes=None, for_training=False)
- module.set_params(arg_params=arg_params, aux_params=aux_params)
-
- kaldiWriter = KaldiWriteOut(None, out_file)
- kaldiWriter.open_or_fd()
- for preds, i_batch, batch in module.iter_predict(data_test):
- label = batch.label[0].asnumpy().astype('int32')
- posteriors = preds[0].asnumpy().astype('float32')
- # copy over states
- if decoding_method == METHOD_BUCKETING:
- for (ind, utt) in enumerate(batch.utt_id):
- if utt != "GAP_UTT":
- posteriors = np.log(posteriors[:label[0][0],1:] + 1e-20) - np.log(data_train.label_mean).T
- kaldiWriter.write(utt, posteriors)
- elif decoding_method == METHOD_SIMPLE:
- for (ind, utt) in enumerate(batch.utt_id):
- if utt != "GAP_UTT":
- posteriors = posteriors[:batch.utt_len[0],1:] - np.log(data_test.label_mean[1:]).T
- kaldiWriter.write(utt, posteriors)
- else:
- outputs = module.get_outputs()
- # outputs[0] is softmax, 1:end are states
- for i in range(1, len(outputs)):
- outputs[i].copyto(data_test.init_state_arrays[i-1])
- for (ind, utt) in enumerate(batch.utt_id):
- if utt != "GAP_UTT":
- posteriors = np.log(posteriors[:,1:])# - np.log(data_train.label_mean).T
- kaldiWriter.write(utt, posteriors)
-
-
- kaldiWriter.close()
- args.config.write(sys.stderr)
-
diff --git a/example/speech-demo/decode_mxnet.sh b/example/speech-demo/decode_mxnet.sh
deleted file mode 100755
index d300d0e..0000000
--- a/example/speech-demo/decode_mxnet.sh
+++ /dev/null
@@ -1,112 +0,0 @@
-#!/bin/bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-
-# Copyright 2012-2013 Karel Vesely, Daniel Povey
-# 2015 Yu Zhang
-# Apache 2.0
-
-# Begin configuration section.
-nnet= # Optionally pre-select network to use for getting state-likelihoods
-feature_transform= # Optionally pre-select feature transform (in front of nnet)
-model= # Optionally pre-select transition model
-class_frame_counts= # Optionally pre-select class-counts used to compute PDF priors
-
-stage=0 # stage=1 skips lattice generation
-nj=4
-cmd=run.pl
-max_active=7000 # maximum of active tokens
-min_active=200 #minimum of active tokens
-max_mem=50000000 # limit the fst-size to 50MB (larger fsts are minimized)
-beam=13.0 # GMM:13.0
-latbeam=8.0 # GMM:6.0
-acwt=0.10 # GMM:0.0833, note: only really affects pruning (scoring is on lattices).
-scoring_opts="--min-lmwt 1 --max-lmwt 10"
-skip_scoring=false
-use_gpu_id=-1 # disable gpu
-#parallel_opts="-pe smp 2" # use 2 CPUs (1 DNN-forward, 1 decoder)
-parallel_opts= # use 2 CPUs (1 DNN-forward, 1 decoder)
-# End configuration section.
-
-echo "$0 $@" # Print the command line for logging
-
-[ -f ./path.sh ] && . ./path.sh; # source the path.
-. parse_options.sh || exit 1;
-
-graphdir=$1
-data=$2
-dir=$3
-srcdir=`dirname $dir`; # The model directory is one level up from decoding directory.
-sdata=$data/split$nj;
-
-mxstring=$4
-
-mkdir -p $dir/log
-[[ -d $sdata && $data/feats.scp -ot $sdata ]] || split_data.sh $data $nj || exit 1;
-echo $nj > $dir/num_jobs
-
-if [ -z "$model" ]; then # if --model <mdl> was not specified on the command line...
- if [ -z $iter ]; then model=$srcdir/final.mdl;
- else model=$srcdir/$iter.mdl; fi
-fi
-
-for f in $model $graphdir/HCLG.fst; do
- [ ! -f $f ] && echo "decode_mxnet.sh: no such file $f" && exit 1;
-done
-
-
-# check that files exist
-for f in $sdata/1/feats.scp $model $graphdir/HCLG.fst; do
- [ ! -f $f ] && echo "$0: no such file $f" && exit 1;
-done
-
-# PREPARE THE LOG-POSTERIOR COMPUTATION PIPELINE
-if [ -z "$class_frame_counts" ]; then
- class_frame_counts=$srcdir/ali_train_pdf.counts
-else
- echo "Overriding class_frame_counts by $class_frame_counts"
-fi
-
-# Create the feature stream:
-feats="scp:$sdata/JOB/feats.scp"
-inputfeats="$sdata/JOB/mxnetInput.scp"
-
-
-if [ -f $sdata/1/feats.scp ]; then
- $cmd JOB=1:$nj $dir/log/make_input.JOB.log \
- echo NO_FEATURE_TRANSFORM scp:$sdata/JOB/feats.scp \> $inputfeats
-fi
-
-# Run the decoding in the queue
-if [ $stage -le 0 ]; then
- $cmd $parallel_opts JOB=1:$nj $dir/log/decode.JOB.log \
- $mxstring --data_test $inputfeats \| \
- latgen-faster-mapped --min-active=$min_active --max-active=$max_active --max-mem=$max_mem --beam=$beam --lattice-beam=$latbeam \
- --acoustic-scale=$acwt --allow-partial=true --word-symbol-table=$graphdir/words.txt \
- $model $graphdir/HCLG.fst ark:- "ark:|gzip -c > $dir/lat.JOB.gz" || exit 1;
-fi
-
-# Run the scoring
-if ! $skip_scoring ; then
- [ ! -x local/score.sh ] && \
- echo "Not scoring because local/score.sh does not exist or not executable." && exit 1;
- local/score.sh $scoring_opts --cmd "$cmd" $data $graphdir $dir || exit 1;
-fi
-
-exit 0;
diff --git a/example/speech-demo/default.cfg b/example/speech-demo/default.cfg
deleted file mode 100644
index 072a4ae..0000000
--- a/example/speech-demo/default.cfg
+++ /dev/null
@@ -1,50 +0,0 @@
-[data]
-kaldi_root =
-train = /home/chiyuan/download/kaldi/egs/ami/s5/exp/sdm1/data-for-mxnet/train.feats
-dev = /home/chiyuan/download/kaldi/egs/ami/s5/exp/sdm1/data-for-mxnet/dev.feats
-test =
-out_file = |
-format = kaldi
-xdim = 40
-ydim = 3920
-label_mean = label_mean.feats
-[arch]
-num_hidden = 1024
-# set it to zero if you want a regular LSTM
-num_hidden_proj = 512
-num_lstm_layer = 3
-
-[train]
-batch_size = 40
-buckets = 100, 200, 300, 400, 500, 600, 700, 800
-num_epoch = 12
-
-# used only if method is truncated-bptt
-truncate_len = 20
-
-# gpu0, gpu1
-context = gpu0
-
-# bucketing, truncated-bptt
-method = truncated-bptt
-
-# checkpoint prefix
-prefix = ami
-
-learning_rate = 1
-decay_factor = 2
-decay_lower_bound = 1e-6
-
-optimizer = speechSGD
-momentum = 0.9
-
-# set to 0 to disable gradient clipping
-clip_gradient = 0
-
-# uniform, normal, xavier
-initializer = Uniform
-init_scale = 0.05
-weight_decay = 0.008
-
-# show progress every how many batches
-show_every = 1000
diff --git a/example/speech-demo/default_timit.cfg b/example/speech-demo/default_timit.cfg
deleted file mode 100644
index 2e0cd2a..0000000
--- a/example/speech-demo/default_timit.cfg
+++ /dev/null
@@ -1,52 +0,0 @@
-[data]
-kaldi_root =
-train = /home/sooda/speech/kaldi/egs/timit/s5/data/train/train.feats
-dev = /home/sooda/speech/kaldi/egs/timit/s5/data/dev/dev.feats
-test =
-out_file = |
-format = kaldi
-xdim = 13
-ydim = 1939
-#ydim = 1909
-label_mean = label_mean.feats
-[arch]
-num_hidden = 1024
-# set it to zero if you want a regular LSTM
-num_hidden_proj = 512
-num_lstm_layer = 3
-
-[train]
-batch_size = 40
-buckets = 100, 200, 300, 400, 500, 600, 700, 800
-num_epoch = 12
-
-# used only if method is truncated-bptt
-truncate_len = 20
-
-# gpu0, gpu1
-context = gpu0
-
-# bucketing, truncated-bptt
-method = truncated-bptt
-#method = bucketing
-
-# checkpoint prefix
-prefix = timit
-
-learning_rate = 1
-decay_factor = 2
-decay_lower_bound = 1e-6
-
-optimizer = speechSGD
-momentum = 0.9
-
-# set to 0 to disable gradient clipping
-clip_gradient = 0
-
-# uniform, normal, xavier
-initializer = Uniform
-init_scale = 0.05
-weight_decay = 0.008
-
-# show progress every how many batches
-show_every = 1000
diff --git a/example/speech-demo/io_func/__init__.py b/example/speech-demo/io_func/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/example/speech-demo/io_func/convert2kaldi.py b/example/speech-demo/io_func/convert2kaldi.py
deleted file mode 100644
index eac8ee6..0000000
--- a/example/speech-demo/io_func/convert2kaldi.py
+++ /dev/null
@@ -1,130 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# Copyright 2013 Yajie Miao Carnegie Mellon University
-
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
-# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
-# MERCHANTABLITY OR NON-INFRINGEMENT.
-# See the Apache 2 License for the specific language governing permissions and
-# limitations under the License.
-
-import numpy as np
-import os
-import sys
-
-from StringIO import StringIO
-import json
-import utils.utils as utils
-from model_io import string_2_array
-
-# Various functions to convert models into Kaldi formats
-def _nnet2kaldi(nnet_spec, set_layer_num = -1, filein='nnet.in',
- fileout='nnet.out', activation='sigmoid', withfinal=True):
- _nnet2kaldi_main(nnet_spec, set_layer_num=set_layer_num, filein=filein,
- fileout=fileout, activation=activation, withfinal=withfinal, maxout=False)
-
-def _nnet2kaldi_maxout(nnet_spec, pool_size = 1, set_layer_num = -1,
- filein='nnet.in', fileout='nnet.out', activation='sigmoid', withfinal=True):
- _nnet2kaldi_main(nnet_spec, set_layer_num=set_layer_num, filein=filein,
- fileout=fileout, activation=activation, withfinal=withfinal,
- pool_size = 1, maxout=True)
-
-def _nnet2kaldi_main(nnet_spec, set_layer_num = -1, filein='nnet.in',
- fileout='nnet.out', activation='sigmoid', withfinal=True, maxout=False):
- elements = nnet_spec.split(':')
- layers = []
- for x in elements:
- layers.append(int(x))
- if set_layer_num == -1:
- layer_num = len(layers) - 1
- else:
- layer_num = set_layer_num + 1
- nnet_dict = {}
- nnet_dict = utils.pickle_load(filein)
-
- fout = open(fileout, 'wb')
- for i in xrange(layer_num - 1):
- input_size = int(layers[i])
- if maxout:
- output_size = int(layers[i + 1]) * pool_size
- else:
- output_size = int(layers[i + 1])
- W_layer = []
- b_layer = ''
- for rowX in xrange(output_size):
- W_layer.append('')
-
- dict_key = str(i) + ' ' + activation + ' W'
- matrix = string_2_array(nnet_dict[dict_key])
-
- for x in xrange(input_size):
- for t in xrange(output_size):
- W_layer[t] = W_layer[t] + str(matrix[x][t]) + ' '
-
- dict_key = str(i) + ' ' + activation + ' b'
- vector = string_2_array(nnet_dict[dict_key])
- for x in xrange(output_size):
- b_layer = b_layer + str(vector[x]) + ' '
-
- fout.write('<affinetransform> ' + str(output_size) + ' ' + str(input_size) + '\n')
- fout.write('[' + '\n')
- for x in xrange(output_size):
- fout.write(W_layer[x].strip() + '\n')
- fout.write(']' + '\n')
- fout.write('[ ' + b_layer.strip() + ' ]' + '\n')
- if maxout:
- fout.write('<maxout> ' + str(int(layers[i + 1])) + ' ' + str(output_size) + '\n')
- else:
- fout.write('<sigmoid> ' + str(output_size) + ' ' + str(output_size) + '\n')
-
- if withfinal:
- input_size = int(layers[-2])
- output_size = int(layers[-1])
- W_layer = []
- b_layer = ''
- for rowX in xrange(output_size):
- W_layer.append('')
-
- dict_key = 'logreg W'
- matrix = string_2_array(nnet_dict[dict_key])
- for x in xrange(input_size):
- for t in xrange(output_size):
- W_layer[t] = W_layer[t] + str(matrix[x][t]) + ' '
-
-
- dict_key = 'logreg b'
- vector = string_2_array(nnet_dict[dict_key])
- for x in xrange(output_size):
- b_layer = b_layer + str(vector[x]) + ' '
-
- fout.write('<affinetransform> ' + str(output_size) + ' ' + str(input_size) + '\n')
- fout.write('[' + '\n')
- for x in xrange(output_size):
- fout.write(W_layer[x].strip() + '\n')
- fout.write(']' + '\n')
- fout.write('[ ' + b_layer.strip() + ' ]' + '\n')
- fout.write('<softmax> ' + str(output_size) + ' ' + str(output_size) + '\n')
-
- fout.close();
diff --git a/example/speech-demo/io_func/feat_io.py b/example/speech-demo/io_func/feat_io.py
deleted file mode 100644
index 6a7e424..0000000
--- a/example/speech-demo/io_func/feat_io.py
+++ /dev/null
@@ -1,412 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import os
-import sys
-import random
-import shlex
-import time
-import re
-
-from utils import to_bool
-from .feat_readers.common import *
-from .feat_readers import stats
-
-class DataReadStream(object):
-
- SCHEMA = {
- "type": "object",
- "properties": {
- "gpu_chunk": {"type": ["string", "integer"], "required": False},
-
- "lst_file": {"type": "string"},
- "separate_lines": {"type": ["string", "integer", "boolean"], "required": False},
- "has_labels": {"type": ["string", "integer", "boolean"], "required": False},
-
- "file_format": {"type": "string"},
- "train_stat": {"type": "string", "required": False},
- "offset_labels": {"type": ["string", "integer", "boolean"], "required": False},
-
- #"XXXchunk": {"type": ["string", "integer"], "required": False},
- "max_feats": {"type": ["string", "integer"], "required": False},
- "shuffle": {"type": ["string", "integer", "boolean"], "required": False},
-
- "seed": {"type": ["string", "integer"], "required": False},
- "_num_splits": {"type": ["string", "integer"], "required": False},
- "_split_id": {"type": ["string", "integer"], "required": False}
- }
- }
-
- END_OF_DATA = -1
- END_OF_PARTITION = -2
- END_OF_SEQ = (None, None, None)
- def __init__(self, dataset_args, n_ins):
-
- # stats
- self.mean = None
- self.std = None
- if 'train_stat' in dataset_args.keys():
- train_stat = dataset_args['train_stat']
- featureStats = stats.FeatureStats()
- featureStats.Load(train_stat)
- self.mean = featureStats.GetMean()
- self.std = featureStats.GetInvStd()
-
- # open lstfile
- file_path = dataset_args["lst_file"]
- if file_path.endswith('.gz'):
- file_read = gzip.open(file_path, 'r')
- else:
- file_read = open(file_path, 'r')
-
- separate_lines = False
- if "separate_lines" in dataset_args:
- separate_lines = to_bool(dataset_args["separate_lines"])
-
- self.has_labels = True
- if "has_labels" in dataset_args:
- self.has_labels = to_bool(dataset_args["has_labels"])
-
- # parse it, file_lst is a list of (featureFile, labelFile) pairs in the input set
- lines = [ln.strip() for ln in file_read]
- lines = [ln for ln in lines if ln != "" ]
-
- if self.has_labels:
- if separate_lines:
- if len(lines) % 2 != 0:
- print("List has mis-matched number of feature files and label files")
- sys.exit(1)
- self.orig_file_lst = []
- for i in xrange(0, len(lines), 2):
- self.orig_file_lst.append((lines[i], lines[i+1]))
- else:
- self.orig_file_lst = []
- for i in xrange(len(lines)):
- pair = re.compile("\s+").split(lines[i])
- if len(pair) != 2:
- print(lines[i])
- print("Each line in the train and eval lists must contain feature file and label file separated by space character")
- sys.exit(1)
- self.orig_file_lst.append(pair)
- else:
- # no labels
- self.orig_file_lst = []
- for i in xrange(0, len(lines), 1):
- self.orig_file_lst.append((lines[i], None))
-
- # save arguments
-
- self.n_ins = n_ins
- self.file_format = dataset_args['file_format']
-
- self.file_format = "htk"
- if 'file_format' in dataset_args:
- self.file_format = dataset_args['file_format']
-
- self.offsetLabels = False
- if 'offset_labels' in dataset_args:
- self.offsetLabels = to_bool(dataset_args['offset_labels'])
-
- self.chunk_size = 32768
- if 'gpu_chunk' in dataset_args:
- self.chunk_size = int(dataset_args['gpu_chunk'])
-
- self.maxFeats = 0
- if "max_feats" in dataset_args:
- self.maxFeats = int(dataset_args["max_feats"])
- if self.maxFeats == 0:
- self.maxFeats = sys.maxint
-
- self.shuffle = True
- if 'shuffle' in dataset_args:
- self.shuffle = to_bool(dataset_args['shuffle'])
-
- self.seed = None
- if "seed" in dataset_args:
- self.seed = int(dataset_args["seed"])
-
- if int("_split_id" in dataset_args) + int("_num_splits" in dataset_args) == 1:
- raise Exception("_split_id must be used with _num_splits")
- self.num_splits = 0
- if "_num_splits" in dataset_args:
- self.num_splits = int(dataset_Args["_num_splits"])
- self.split_id = dataset_args["_split_id"]
-
- # internal state
- self.split_parts = False
- self.by_matrix = False
- self.x = numpy.zeros((self.chunk_size, self.n_ins), dtype=numpy.float32)
- if self.has_labels:
- self.y = numpy.zeros((self.chunk_size,), dtype=numpy.int32)
- else:
- self.y = None
- self.numpy_rng = numpy.random.RandomState(self.seed)
-
- #self.make_shared()
- self.initialize_read()
-
- def read_by_part(self):
- if self.file_format in ["kaldi"]:
- self.read_by_matrix()
- else: # htk
- self.split_parts = True
-
- def read_by_matrix(self):
- self.by_matrix = True
-
-
- def get_shared(self):
- return self.shared_x, self.shared_y
-
- def initialize_read(self):
- self.file_lst = self.orig_file_lst[:]
- if self.shuffle:
- self.numpy_rng.shuffle(self.file_lst)
- self.fileIndex = 0
- self.totalFrames = 0
- self.reader = None
- self.crossed_part = False
- self.done = False
- self.utt_id = None
- self.queued_feats = None
- self.queued_tgts = None
-
- def _end_of_data(self):
- return self.totalFrames >= self.maxFeats or self.fileIndex >= len(self.file_lst)
-
- def _queue_get(self, at_most):
- # if we have frames/labels queued, return at_most of those and queue the rest
- if self.queued_feats is None:
- return None
-
- num_queued = self.queued_feats.shape[0]
- at_most = min(at_most, num_queued)
-
- if at_most == num_queued: # no leftover after the split
- feats, tgts = self.queued_feats, self.queued_tgts
- self.queued_feats = None
- self.queued_tgts = None
- else:
- feats, self.queued_feats = numpy.array_split(self.queued_feats, [at_most])
- if self.queued_tgts is not None:
- tgts, self.queued_tgts = numpy.array_split(self.queued_tgts, [at_most])
- else:
- tgts = None
-
- return feats, tgts
-
- def _queue_excess(self, at_most, feats, tgts):
- assert(self.queued_feats is None)
- num_supplied = feats.shape[0]
-
- if num_supplied > at_most:
- feats, self.queued_feats = numpy.array_split(feats, [at_most])
- if tgts is not None:
- tgts, self.queued_tgts = numpy.array_split(tgts, [at_most])
-
- return feats, tgts
-
- # Returns frames/labels (if there are any) or None (otherwise) for current partition
- # Always set the pointers to the next partition
- def _load_fn(self, at_most):
- tup = self._queue_get(at_most)
- if tup is not None:
- return tup
-
- if self.reader is None:
- featureFile, labelFile = self.file_lst[self.fileIndex]
- self.reader = getReader(self.file_format, featureFile, labelFile)
-
- if self.reader.IsDone():
- self.fileIndex += 1
- self.reader.Cleanup()
- self.reader = None # cleanup
- return None
-
- tup = self.reader.Read()
- if tup is None:
- self.fileIndex += 1
- self.reader.Cleanup()
- self.reader = None # cleanup
- return None
-
- feats, tgts = tup
-
- # normalize here
- if self.mean is not None:
- feats -= self.mean
- if self.std is not None:
- feats *= self.std
-
- self.utt_id = self.reader.GetUttId()
-
- if feats.shape[1] != self.n_ins:
- errMs = "Dimension of features read does not match specified dimensions".format(feats.shape[1], self.n_ins)
-
- if self.has_labels and tgts is not None:
- if feats.shape[0] != tgts.shape[0]:
- errMs = "Number of frames in feature ({}) and label ({}) files does not match".format(self.featureFile, self.labelFile)
- raise FeatureException(errMsg)
-
- if self.offsetLabels:
- tgts = numpy.add(tgts, - 1)
-
- feats, tgts = self._queue_excess(at_most, feats, tgts)
-
- return feats, tgts
-
- def current_utt_id(self):
- assert(self.by_matrix or self.split_parts)
- return self.utt_id
-
- def load_next_seq(self):
- if self.done:
- return DataReadStream.END_OF_SEQ
- if self._end_of_data():
- if self.reader is not None:
- self.reader.Cleanup()
- self.reader = None
- self.done = True
- return DataReadStream.END_OF_SEQ
-
- num_feats = 0
- old_fileIndes = self.fileIndex
-
- self.utt_id = None
-
- tup = self._load_fn(self.chunk_size)
- if tup is None:
- return DataReadStream.END_OF_SEQ
- (loaded_feats, loaded_tgts) = tup
- return loaded_feats, loaded_tgts, self.utt_id
-
-
- def load_next_block(self):
- # if anything left...
- # set_value
-
- if self.crossed_part:
- self.crossed_part = False
- if not self.by_matrix: # <--- THERE IS A BUG IN THIS
- return DataReadStream.END_OF_PARTITION
- if self.done:
- return DataReadStream.END_OF_DATA
- if self._end_of_data():
- if self.reader is not None:
- self.reader.Cleanup()
- self.reader = None # cleanup
- self.done = True
- return DataReadStream.END_OF_DATA
-
- # keep loading features until we pass a partition or EOF
-
- num_feats = 0
- old_fileIndex = self.fileIndex
-
- self.utt_id = None
-
- while num_feats < self.chunk_size:
- if self.split_parts:
- if old_fileIndex != self.fileIndex:
- self.crossed_part = True
- break
-
- if self._end_of_data():
- break
-
- tup = self._load_fn(self.chunk_size - num_feats)
- if tup is None:
- continue
-
- (loaded_feat, loaded_label) = tup
-
- if self.has_labels and loaded_label is None:
- print(sys.stderr, "Missing labels for: ", self.utt_id)
- continue
-
- numFrames = loaded_feat.shape[0]
-
- # limit loaded_feat, loaded_label, and numFrames to maximum allowed
- allowed = self.maxFeats - self.totalFrames
- if numFrames > allowed:
- loaded_feat = loaded_feat[0:allowed]
- if self.has_labels:
- loaded_label = loaded_label[0:allowed]
- numFrames = allowed
- assert(numFrames == loaded_feat.shape[0])
-
- self.totalFrames += numFrames
- new_num_feats = num_feats + numFrames
-
- # if the x and y buffers are too small, make bigger ones
- # not possible any more; buffers are always fixed
- """
- if new_num_feats > self.x.shape[0]:
- newx = numpy.zeros((new_num_feats, self.n_ins), dtype=numpy.float32)
- newx[0:num_feats] = self.x[0:num_feats]
- self.x = newx
-
- if self.has_labels:
- newy = numpy.zeros((new_num_feats,), dtype=numpy.int32)
- newy[0:num_feats] = self.y[0:num_feats]
- self.y = newy
- """
-
- # place into [num_feats:num_feats+num_loaded]
- self.x[num_feats:new_num_feats] = loaded_feat
- if self.has_labels:
- self.y[num_feats:new_num_feats] = loaded_label
-
- num_feats = new_num_feats
-
- if self.by_matrix:
- break
-
- # if we loaded features, shuffle and copy to shared
- if num_feats != 0:
-
- if self.shuffle:
- x = self.x[0:num_feats]
- state = self.numpy_rng.get_state()
- self.numpy_rng.shuffle(x)
- self.x[0:num_feats] = x
-
- if self.has_labels:
- y = self.y[0:num_feats]
- self.numpy_rng.set_state(state)
- self.numpy_rng.shuffle(y)
- self.y[0:num_feats] = y
-
- assert(self.x.shape == (self.chunk_size, self.n_ins))
- self.shared_x.set_value(self.x, borrow = True)
- if self.has_labels:
- self.shared_y.set_value(self.y, borrow = True)
-
- #import hashlib
- #print self.totalFrames, self.x.sum(), hashlib.sha1(self.x.view(numpy.float32)).hexdigest()
-
- if self.by_matrix:
- self.crossed_part = True
-
- return num_feats
-
- def get_state(self):
- return self.numpy_rng.get_state()
-
- def set_state(self, state):
- self.numpy_rng.set_state(state)
diff --git a/example/speech-demo/io_func/feat_readers/__init__.py b/example/speech-demo/io_func/feat_readers/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/example/speech-demo/io_func/feat_readers/common.py b/example/speech-demo/io_func/feat_readers/common.py
deleted file mode 100644
index 742d3e2..0000000
--- a/example/speech-demo/io_func/feat_readers/common.py
+++ /dev/null
@@ -1,75 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import numpy
-import os
-
-class ByteOrder:
- LittleEndian, BigEndian = range(2)
-
-class FeatureException(Exception):
- def __init__(self,msg):
- self.msg = msg
- def __str__(self):
- return repr(self.msg)
-
-def ReadLabel(filename):
- labels = numpy.loadtxt(filename, ndmin=1)
- return labels.astype(numpy.int32)
-
-class BaseReader():
- def __init__(self, featureFile, labelFile, byteOrder=None):
- self.byteOrder = byteOrder
- self.featureFile = featureFile
- self.labelFile = labelFile
- self.done = False
-
- def _markDone(self):
- self.done = True
-
- def IsDone(self):
- return self.done
-
- def Read(self):
- pass
-
- def Cleanup(self):
- pass
-
- # no slashes or weird characters
- def GetUttId(self):
- return os.path.basename(self.featureFile)
-
-def getReader(fileformat, featureFile, labelFile):
- if fileformat.lower() == 'htk':
- import reader_htk
- return reader_htk.htkReader(featureFile, labelFile, ByteOrder.BigEndian)
- elif fileformat.lower() == 'htk_little':
- import reader_htk
- return reader_htk.htkReader(featureFile, labelFile, ByteOrder.LittleEndian)
- elif fileformat.lower() == 'bvec':
- import reader_bvec
- return reader_bvec.bvecReader(featureFile, labelFile)
- elif fileformat.lower() == 'atrack':
- import reader_atrack
- return reader_atrack.atrackReader(featureFile, labelFile)
- elif fileformat.lower() == 'kaldi':
- import reader_kaldi
- return reader_kaldi.kaldiReader(featureFile, labelFile)
- else:
- msg = "Error: Specified format '{}' is not supported".format(fileformat)
- raise Exception(msg)
diff --git a/example/speech-demo/io_func/feat_readers/reader_atrack.py b/example/speech-demo/io_func/feat_readers/reader_atrack.py
deleted file mode 100644
index e8db0fd..0000000
--- a/example/speech-demo/io_func/feat_readers/reader_atrack.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import numpy
-import numpy as num
-import stats
-from common import *
-
-class atrackReader(BaseReader):
- def __init__(self, featureFile, labelFile, byteOrder=None):
- BaseReader.__init__(self, featureFile, labelFile, byteOrder)
-
- def checkHeader(self, header):
- assert(header[0] == 0x56782)
- assert(header[1] == header[6]) # and header[1] == frameSize)
- assert(header[2] == header[5]) # and header[2] >= numSamples)
- assert(header[3] == 0)
- assert(header[4] == 24) # size of float + 20
- assert(header[4])
-
- def Read(self):
- # flip both the header and data using >
- # atrack format...
- """
- 0.000000 354178 -2107177728
- 0.000000 1845 889651200
- 0.000000 1124588 -332918528
- 0.000000 0 0
- 0.000000 24 402653184
- 0.000000 1124588 -332918528
- 0.000000 1845 889651200
- -2.395848 -1072081519 -1856693824
- -1.677172 -1076449904 -1867655489
- -1.562828 -1077409088 -1073035073
- """
-
- f = open(self.featureFile, "rb")
- header = num.fromfile(f, dtype=num.dtype('>i4'), count=7)
- self.checkHeader(header)
-
- frameSize = header[1]
- numSamples = header[2]
-
- a = num.fromfile(f, dtype=num.dtype('>f4'), count=numSamples*frameSize)
- f.close()
-
- a = a.astype(num.float32)
- a = a.reshape((numSamples, frameSize))
-
- self._markDone()
-
- return a, ReadLabel(self.labelFile)
diff --git a/example/speech-demo/io_func/feat_readers/reader_bvec.py b/example/speech-demo/io_func/feat_readers/reader_bvec.py
deleted file mode 100644
index 3a0f745..0000000
--- a/example/speech-demo/io_func/feat_readers/reader_bvec.py
+++ /dev/null
@@ -1,47 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import struct
-import array
-import numpy
-from common import *
-
-class bvecReader(BaseReader):
-
- def __init__(self, featureFile, labelFile, byteOrder=None):
- BaseReader.__init__(self, featureFile, labelFile, byteOrder)
-
- def Read(self):
-
- with open(self.featureFile,"rb") as f:
-
- dt = numpy.dtype([('numSamples',(numpy.int32,1)),('dim',(numpy.int32,1))])
- header = numpy.fromfile(f,dt.newbyteorder('>'),count=1)
-
- numSamples = header[0]['numSamples']
- dim = header[0]['dim']
-
- print('Num samples = {}'.format(numSamples))
- print('dim = {}'.format(dim))
-
- dt = numpy.dtype([('sample',(numpy.float32,dim))])
- samples = numpy.fromfile(f,dt.newbyteorder('>'),count=numSamples)
-
- self._markDone()
-
- return samples[:]['sample'], ReadLabel(self.labelFile)
diff --git a/example/speech-demo/io_func/feat_readers/reader_htk.py b/example/speech-demo/io_func/feat_readers/reader_htk.py
deleted file mode 100644
index dca24d9..0000000
--- a/example/speech-demo/io_func/feat_readers/reader_htk.py
+++ /dev/null
@@ -1,54 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import numpy
-import stats
-from common import *
-
-class htkReader(BaseReader):
- def __init__(self, featureFile, labelFile, byteOrder=None):
- BaseReader.__init__(self, featureFile, labelFile, byteOrder)
-
- def Read(self):
-
- #return numpy.ones((256, 819)).astype('float32'), numpy.ones(256).astype('int32')
-
- with open(self.featureFile,"rb") as f:
-
- dt = numpy.dtype([('numSamples',(numpy.int32,1)),('sampPeriod',(numpy.int32,1)),('sampSize',(numpy.int16,1)),('sampKind',(numpy.int16,1))])
- header = numpy.fromfile(f,dt.newbyteorder('>' if self.byteOrder==ByteOrder.BigEndian else '<'),count=1)
-
- numSamples = header[0]['numSamples']
- sampPeriod = header[0]['sampPeriod']
- sampSize = header[0]['sampSize']
- sampKind = header[0]['sampKind']
-
- # print 'Num samples = {}'.format(numSamples)
- # print 'Sample period = {}'.format(sampPeriod)
- # print 'Sample size = {}'.format(sampSize)
- # print 'Sample kind = {}'.format(sampKind)
- dt = numpy.dtype([('sample',(numpy.float32,sampSize/4))])
- samples = numpy.fromfile(f,dt.newbyteorder('>' if self.byteOrder==ByteOrder.BigEndian else '<'),count=numSamples)
-
- self._markDone()
-
- if self.labelFile is None:
- labels = None
- else:
- labels = ReadLabel(self.labelFile)
-
- return samples[:]['sample'], labels
diff --git a/example/speech-demo/io_func/feat_readers/reader_kaldi.py b/example/speech-demo/io_func/feat_readers/reader_kaldi.py
deleted file mode 100644
index 345934a..0000000
--- a/example/speech-demo/io_func/feat_readers/reader_kaldi.py
+++ /dev/null
@@ -1,154 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from common import *
-
-import random
-import time
-
-import ctypes
-import numpy
-import sys
-import re
-
-c_float_ptr = ctypes.POINTER(ctypes.c_float)
-c_int_ptr = ctypes.POINTER(ctypes.c_int)
-c_void_p = ctypes.c_void_p
-c_int = ctypes.c_int
-c_char_p = ctypes.c_char_p
-c_float = ctypes.c_float
-
-kaldi = ctypes.cdll.LoadLibrary("libkaldi-python-wrap.so") # this needs to be in LD_LIBRARY_PATH
-
-def decl(f, restype, argtypes):
- f.restype = restype
- if argtypes is not None and len(argtypes) != 0:
- f.argtypes = argtypes
-
-decl(kaldi.SBFMReader_new, c_void_p, [])
-decl(kaldi.SBFMReader_new_char, c_void_p, [c_char_p])
-decl(kaldi.SBFMReader_Open, c_int, [c_void_p, c_char_p])
-decl(kaldi.SBFMReader_Done, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Key, c_char_p, [c_void_p])
-decl(kaldi.SBFMReader_FreeCurrent, None, [c_void_p])
-decl(kaldi.SBFMReader_Value, c_void_p, [c_void_p])
-decl(kaldi.SBFMReader_Next, None, [c_void_p])
-decl(kaldi.SBFMReader_IsOpen, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Close, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Delete, None, [c_void_p])
-
-decl(kaldi.MatrixF_NumRows, c_int, [c_void_p])
-decl(kaldi.MatrixF_NumCols, c_int, [c_void_p])
-decl(kaldi.MatrixF_Stride, c_int, [c_void_p])
-decl(kaldi.MatrixF_cpy_to_ptr, None, [c_void_p, c_float_ptr, c_int])
-decl(kaldi.MatrixF_SizeInBytes, c_int, [c_void_p])
-decl(kaldi.MatrixF_Data, c_float_ptr, [c_void_p])
-
-decl(kaldi.RAPReader_new_char, c_void_p, [c_char_p])
-decl(kaldi.RAPReader_HasKey, c_int, [c_void_p, c_char_p])
-decl(kaldi.RAPReader_Value, c_int_ptr, [c_void_p, c_char_p])
-decl(kaldi.RAPReader_DeleteValue, None, [c_void_p, c_int_ptr])
-decl(kaldi.RAPReader_Delete, None, [c_void_p])
-
-decl(kaldi.Nnet_new, c_void_p, [c_char_p, c_float, c_int])
-decl(kaldi.Nnet_Feedforward, c_void_p, [c_void_p, c_void_p])
-decl(kaldi.Nnet_Delete, None, [c_void_p])
-
-class kaldiReader(BaseReader):
- def __init__(self, featureFile, labelFile, byteOrder=None):
- BaseReader.__init__(self, featureFile, labelFile, byteOrder)
-
- arr = re.split('\s+', featureFile, maxsplit=1)
- if len(arr) != 2:
- raise Exception("two items required in featureFile line: <transform> <rspecifier>")
- feature_transform, featureFile = arr
- if feature_transform == "NO_FEATURE_TRANSFORM":
- feature_transform = None
-
- self.feature_rspecifier = featureFile
- self.targets_rspecifier = labelFile
- self.feature_reader = kaldi.SBFMReader_new_char(self.feature_rspecifier)
-
- if self.targets_rspecifier is not None:
- self.targets_reader = kaldi.RAPReader_new_char(self.targets_rspecifier)
- if feature_transform is not None:
- self.nnet_transf = kaldi.Nnet_new(feature_transform, ctypes.c_float(1.0), 1)
- else:
- self.nnet_transf = None
-
- def Cleanup(self):
- kaldi.SBFMReader_Delete(self.feature_reader)
- if self.targets_rspecifier is not None:
- kaldi.RAPReader_Delete(self.targets_reader)
- if self.nnet_transf is not None:
- kaldi.Nnet_Delete(self.nnet_transf)
-
- def Read(self):
- if kaldi.SBFMReader_Done(self.feature_reader):
- self._markDone()
- return None
- utt = kaldi.SBFMReader_Key(self.feature_reader)
- self.utt_id = utt
-
- #return numpy.ones((256, 819)).astype('float32'), numpy.ones(256).astype('int32')
-
- feat_value = kaldi.SBFMReader_Value(self.feature_reader)
- if self.nnet_transf is not None:
- feat_value = kaldi.Nnet_Feedforward(self.nnet_transf, feat_value)
- feat_rows = kaldi.MatrixF_NumRows(feat_value)
- feat_cols = kaldi.MatrixF_NumCols(feat_value)
- feat_data = kaldi.MatrixF_Data(feat_value)
-
- # never use numpy.ndarray(buf=) or numpy.ctypeslib.as_array
- # because you don't know if Python or C owns buffer
- # (even if you numpy.copy() resulting array)
- # http://stackoverflow.com/questions/4355524/getting-data-from-ctypes-array-into-numpy
- #
- # Can't use memmove/memcpy because arrays are strided
- # Use special function -_-
-
- feats = numpy.empty((feat_rows,feat_cols), dtype=numpy.float32)
- # MUST: cast Python int to pointer, otherwise C interprets as 32-bit
- # if you print the pointer value before casting, you might see weird value before seg fault
- # casting fixes that
- feats_numpy_ptr = ctypes.cast(feats.ctypes.data, c_float_ptr)
- kaldi.MatrixF_cpy_to_ptr(feat_value, feats_numpy_ptr, feats.strides[0]/4)
-
- if self.targets_rspecifier is not None:
- if kaldi.RAPReader_HasKey(self.targets_reader, utt):
- tgt_value = kaldi.RAPReader_Value(self.targets_reader, utt)
-
- tgts = numpy.empty((feat_rows,), dtype=numpy.int32)
- # ok to use memmove because this is 1-dimensional array I made in C (no stride)
- tgts_numpy_ptr = ctypes.cast(tgts.ctypes.data, c_int_ptr)
- ctypes.memmove(tgts_numpy_ptr, tgt_value, 4 * feat_rows)
-
- kaldi.RAPReader_DeleteValue(self.targets_reader, tgt_value)
- else:
- tgts = None
- else:
- tgts = None
-
- kaldi.SBFMReader_Next(self.feature_reader)
-
- #print "FEATS:", feats[0:5][0:5]
- #print "TGTS :", tgts[0:5]
-
- return feats, tgts
-
- def GetUttId(self):
- return self.utt_id
diff --git a/example/speech-demo/io_func/feat_readers/stats.py b/example/speech-demo/io_func/feat_readers/stats.py
deleted file mode 100644
index a2c8473..0000000
--- a/example/speech-demo/io_func/feat_readers/stats.py
+++ /dev/null
@@ -1,162 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import numpy
-
-class _StreamVariance(object):
-
- def __init__(self,nCols):
- self.n = 0;
- self.mean = numpy.zeros(nCols)
- self.M2 = numpy.zeros(nCols)
-
- def AddX(self,value):
- # do not operate in the same way when the input is an 1
- # dimension array or a 2 dimension array. Maybe there is
- # a better way to handle that
- if len(value.shape) == 2:
- for x in value:
- self.n = self.n+1
- delta = x-self.mean
- self.mean = self.mean+delta/self.n
- self.M2 = self.M2+delta*(x-self.mean)
- elif len(value.shape) == 1:
- self.n = self.n+1
- delta = value-self.mean
- self.mean = self.mean+delta/self.n
- self.M2 = self.M2+delta*(value-self.mean)
- else:
- msg = 'Only 1D and 2D array are supported'
- raise Exception(msg)
-
- def GetMean(self):
- return self.mean
-
- def GetVariance(self):
- return self.M2/(self.n-1)
-
- def GetInvStandardDeviation(self):
- return 1.0/(numpy.sqrt(self.M2/(self.n-1)))
-
- def GetNumberOfSamples(self):
- return self.n
-
-class FeatureStats(object):
-
- def __init__(self):
- self.mean = numpy.zeros(1,)
- self.invStd = numpy.zeros(1,)
- self.populationSize = 0
- self.dim = None
-
- def GetMean(self):
- return self.mean
-
- def GetVariance(self):
- return numpy.power(self.GetStd(), 2)
-
- def GetStd(self):
- return 1.0/self.invStd
-
- def GetInvStd(self):
- return self.invStd
-
- """
-
- def GetStatsFromList(self,fileList,featureFileHandler):
- stats = None
-
- for featureFile,label in featureList.FeatureList(fileList):
- if stats is None:
- self.dim = self.getDimFromFile(featureFile,featureFileHandler)
- stats = _StreamVariance(self.dim)
-
- samples = featureFileHandler.Read(featureFile)
-
- print('Process file : "{}"'.format(featureFile))
- stats.AddX(samples)
-
- print('Read {} samples'.format(stats.GetNumberOfSamples()))
- self.mean = stats.GetMean()
- self.invStd = stats.GetInvStandardDeviation()
- self.populationSize = stats.GetNumberOfSamples()
-
- return (self.mean,self.invStd)
-
- def GetStatsFromFile(self,featureFile,featureFileHandler):
- self.dim = self.getDimFromFile(featureFile,featureFileHandler)
- stats = _StreamVariance(self.dim)
-
- samples = featureFileHandler.Read(featureFile)
- stats.AddX(samples)
- self.mean = stats.GetMean()
- self.invStd = stats.GetInvStandardDeviation()
- self.populationSize = stats.GetNumberOfSamples()
-
- return (self.mean,self.invStd)
-
- def getDimFromFile(self,featureFile,featureFileHandler):
- return featureFileHandler.GetDim(featureFile)
-
- """
-
- def Load(self,filename):
- with open(filename,"rb") as f:
- dt = numpy.dtype([('magicNumber',(numpy.int32,1)),('numSamples',(numpy.int32,1)),('dim',(numpy.int32,1))])
- header = numpy.fromfile(f,dt,count=1)
-
- if header[0]['magicNumber'] != 21812:
- msg = 'File {} is not a stat file (wrong magic number)'
- raise Exception(msg)
-
- self.populationsize = header[0]['numSamples']
- dim = header[0]['dim']
-
- dt = numpy.dtype([('stats',(numpy.float32,dim))])
- self.mean = numpy.fromfile(f,dt,count=1)[0]['stats']
- self.invStd = numpy.fromfile(f,dt,count=1)[0]['stats']
-
- def Save(self,filename):
- with open(filename,'wb') as f:
- dt = numpy.dtype([('magicNumber',(numpy.int32,1)),('numSamples',(numpy.int32,1)),('dim',(numpy.int32,1))])
- header=numpy.zeros((1,),dtype=dt)
- header[0]['magicNumber'] = 21812
- header[0]['numSamples'] = self.populationSize
- header[0]['dim'] = self.mean.shape[0]
- header.tofile(f)
-
- self.mean.astype(numpy.float32).tofile(f)
- self.invStd.astype(numpy.float32).tofile(f)
-
-if __name__ == '__main__':
-
- import argparse
-
- parser = argparse.ArgumentParser(description='Print the mean and standard deviation from a stat file',formatter_class=argparse.ArgumentDefaultsHelpFormatter)
- parser.add_argument('filename', help="Name of the stat file")
- args = parser.parse_args()
- featureStats = FeatureStats()
- featureStats.Load(args.filename)
-
- numpy.set_printoptions(threshold='nan')
- print("THIS IS THE MEAN: ")
- print(featureStats.GetMean())
- print("THIS IS THE INVERSE STD: ")
- print(featureStats.GetInvStd())
-
-
diff --git a/example/speech-demo/io_func/feat_readers/writer_kaldi.py b/example/speech-demo/io_func/feat_readers/writer_kaldi.py
deleted file mode 100644
index 0f8fb93..0000000
--- a/example/speech-demo/io_func/feat_readers/writer_kaldi.py
+++ /dev/null
@@ -1,74 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import sys
-import numpy
-import struct
-import subprocess
-import os
-
-# Functions to read and write Kaldi binary-formatted .scp and .ark
-
-class KaldiWriteOut(object):
-
- def __init__(self, scp_path, ark_path):
-
- self.ark_path = ark_path
- self.scp_path = scp_path
- self.out_ark = None
- self.out_scp = None
- if sys.byteorder != 'little':
- raise Exception("output file needs to be little endian")
-
- def open(self):
- self.out_ark = open(self.ark_path, "w")
- self.out_scp = open(self.scp_path, "w")
-
- def open_or_fd(self):
- offset = None
- if self.ark_path[0] == '|':
- #self.out_ark = os.popen(sys.stdout, 'wb')
- self.out_ark = sys.stdout
- else:
- self.out_ark = open(self.ark_path, "w")
- def write(self, uttID, data):
- assert data.dtype == numpy.float32
-
- self.out_ark.write(uttID + ' ')
- if self.out_scp is not None:
- start_offset = self.out_ark.tell()
-
- # write out ark
- num_row, num_col = data.shape
- self.out_ark.write('\0B')
- self.out_ark.write('FM ')
- self.out_ark.write(chr(4))
- self.out_ark.write(struct.pack('i', num_row))
- self.out_ark.write(chr(4))
- self.out_ark.write(struct.pack('i', num_col))
- data.tofile(self.out_ark)
- self.out_ark.flush()
-
- # write out scp
- if self.out_scp is not None:
- scp_out = uttID + ' ' + self.ark_path + ':' + str(start_offset)
- self.out_scp.write(scp_out + '\n')
-
- def close(self):
- self.out_ark.close()
- if self.out_scp is not None:
- self.out_scp.close()
diff --git a/example/speech-demo/io_func/info.py b/example/speech-demo/io_func/info.py
deleted file mode 100644
index eaf95ab..0000000
--- a/example/speech-demo/io_func/info.py
+++ /dev/null
@@ -1,23 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import os
-
-_mydir = os.path.dirname(__file__) or '.'
-
-ROOT = os.path.abspath(os.path.join(_mydir, "../.."))
-CONFIGS = os.path.join(ROOT, "configs")
diff --git a/example/speech-demo/io_func/kaldi_parser.py b/example/speech-demo/io_func/kaldi_parser.py
deleted file mode 100644
index 10a373d..0000000
--- a/example/speech-demo/io_func/kaldi_parser.py
+++ /dev/null
@@ -1,219 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import struct
-import numpy as num
-import sys
-
-class KaldiParser(object):
-
- NO_OPEN_BRACKET = "found > before <"
- ERR_NO_CLOSE_BRACKET = "reached eof before >"
- ERR_BYTES_BEFORE_TOKEN = "found bytes before <"
- NO_SPACE_AFTER = "missing space after >"
-
- def __init__(self, f):
- self.f = f
- self.binary = self.f.read(2) == '\0B'
- assert(self.binary), "text format not supported yet"
- if not self.binary:
- self.f.seek(0, 0)
-
- def is_binary(self):
- return self.binary
-
- def try_next_token(self):
- pos = self.f.tell()
- err, tok = self.next_token()
- if err is not None:
- self.f.seek(pos, 0)
- print(err, tok)
- return None
- return tok.lower()
-
- def next_token(self):
- # keep reading until you get a > or at end of file (return None)
- # consume the space
- # return substring from < to >
- # if things before < are not space, return error
- buf = ""
- while True:
- b = self.f.read(1)
- if b is None:
- return KaldiParser.ERR_NO_CLOSE_BRACKET, None
- buf += b
- if b == ">":
- break
-
- try:
- start = buf.index("<")
- except ValueError:
- return KaldiParser.NO_OPEN_BRACKET, None
-
- b = self.f.read(1)
- if not (b == " " or b is None):
- return KaldiParser.NO_SPACE_AFTER, buf[start:]
-
- if start != 0:
- return KaldiParser.ERR_BYTES_BEFORE_TOKEN, buf[start:]
-
- return None, buf
-
- def read_space(self):
- b = self.f.read(1)
- assert(b == " " or b is None)
-
- # http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
- def read_basic_type(self, type):
- if self.binary:
- size = num.fromfile(self.f, dtype=num.dtype("i1"), count=1)[0]
-
- if type == "int":
- dtype = "<i4"
- dsize = 4
- elif type == "float":
- dtype = "<f4"
- dsize = 4
- elif type == "char":
- dtype = 'a'
- dsize = 1
- else:
- print("unrecognized type")
- return None
-
- assert(size == dsize)
- n = num.fromfile(self.f, dtype=num.dtype(dtype), count=1)
- return n[0]
-
- else:
- assert(False), "not supported yet"
-
- def read_matrix(self):
- mode = self.f.read(2)
- #print mode
- assert(mode == 'FM')
- self.read_space()
-
- rows = self.read_basic_type("int")
- #print "rows", rows
- cols = self.read_basic_type("int")
- #print "cols", cols
-
- n = num.fromfile(self.f, dtype=num.dtype("<f4"), count=rows * cols)
- n = n.reshape((rows, cols))
-
- #print n[0][0]
- #print "-----------"
- return n
-
- def read_vector(self):
- mode = self.f.read(2)
- #print mode
- assert(mode == 'FV')
- self.read_space()
-
- length = self.read_basic_type("int")
- #print "length", length
-
- n = num.fromfile(self.f, dtype=num.dtype("<f4"), count=length)
- #print n[0]
- #print "-----------"
- return n
-
-def fileIsBinary(filename):
- f = open(filename, "rb")
- binary = (f.read(2) == '\0B')
- f.seek(0, 0)
- return binary
-
-def file2nnet_binary(filename):
- f = open(filename, "rb")
- parser = KaldiParser(f)
-
- net = []
- layer = None
- while True:
- tok = parser.try_next_token()
- if tok is None:
- print("error")
- break
- if tok == "<nnet>":
- continue
- elif tok == "<affinetransform>":
- if layer is not None:
- net += [layer]
- layer = {}
- layer["outdim"] = parser.read_basic_type("int")
- layer["indim"] = parser.read_basic_type("int")
- elif tok == "<learnratecoef>":
- parser.read_basic_type("float")
- elif tok == "<biaslearnratecoef>":
- parser.read_basic_type("float")
- elif tok == "<maxnorm>":
- parser.read_basic_type("float")
- layer["weights"] = parser.read_matrix().transpose() # kaldi writes the transpose!!!!
- layer["bias"] = parser.read_vector()
- elif tok == "<sigmoid>" or tok == "<softmax>":
- layer["type"] = tok[1:-1]
- outdim1 = parser.read_basic_type("int")
- outdim2 = parser.read_basic_type("int")
- assert(outdim1 == outdim2 and outdim2 == layer["outdim"])
- elif tok == "</nnet>":
- #print "Done!"
- break
- else:
- print("unrecognized token", tok)
- break
-
- if layer is not None:
- net += [layer]
-
- #for layer in net:
- # print layer.keys()
-
- return net
-
-if __name__ == '__main__':
- filename = "exp/dnn4_pretrain-dbn_dnn/nnet_6.dbn_dnn.init"
- #filename = "/usr/users/leoliu/s5/exp/dnn4_pretrain-dbn_dnn/final.feature_transform"
- print(filename)
-
- print("isBinary:", fileIsBinary(filename))
- a = file2nnet_binary(filename)
-
-
-
- """
- while True:
- err, tok = parser.next_token()
- if err != KaldiParser.NO_SPACE_AFTER and tok is not None:
- print(err, tok)
- """
-
-"""
- fout.write('<affinetransform> ' + str(output_size) + ' ' + str(input_size) + '\n')
- fout.write('[' + '\n')
- for x in xrange(output_size):
- fout.write(W_layer[x].strip() + '\n')
- fout.write(']' + '\n')
- fout.write('[ ' + b_layer.strip() + ' ]' + '\n')
- if maxout:
- fout.write('<maxout> ' + str(int(layers[i + 1])) + ' ' + str(output_size) + '\n')
- else:
- fout.write('<sigmoid> ' + str(output_size) + ' ' + str(output_size) + '\n')
-"""
diff --git a/example/speech-demo/io_func/model_io.py b/example/speech-demo/io_func/model_io.py
deleted file mode 100755
index 8b6e043..0000000
--- a/example/speech-demo/io_func/model_io.py
+++ /dev/null
@@ -1,275 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import numpy as np
-import os
-import sys
-import logging
-
-from StringIO import StringIO
-import json
-
-
-from datetime import datetime
-
-from kaldi_parser import *
-import utils.utils as utils
-
-# nicer interface for file2nnet, nnet2file
-
-def load(model, filename, gradients, num_hidden_layers=-1, with_final=True, factors=None):
- _file2nnet(model.sigmoid_layers, set_layer_num = num_hidden_layers,
- filename=filename, activation="sigmoid", withfinal=with_final, factor=1.0, gradients=gradients, factors=factors)
-
-def save(model, filename):
- _nnet2file(model.sigmoid_layers, set_layer_num = -1, filename=filename,
- activation="sigmoid", start_layer = 0, withfinal=True)
-
-# convert an array to a string
-def array_2_string(array):
- return array.astype('float32')
-
-# convert a string to an array
-def string_2_array(string):
- if isinstance(string, str) or isinstance(string, unicode):
- str_in = StringIO(string)
- return np.loadtxt(str_in)
- else:
- return string
-
-def _nnet2file(layers, set_layer_num = -1, filename='nnet.out', activation='sigmoid', start_layer = 0, withfinal=True, input_factor = 0.0, factor=[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]):
- logger = logging.getLogger(__name__)
- logger.info("Saving network "+filename)
-
- n_layers = len(layers)
- nnet_dict = {}
- if set_layer_num == -1:
- set_layer_num = n_layers - 1
-
- for i in range(start_layer, set_layer_num):
- logger.info("Saving hidden layer "+str(i))
- dict_a = str(i) + ' ' + activation + ' W'
- if i == 0:
- nnet_dict[dict_a] = array_2_string((1.0 - input_factor) * layers[i].params[0].get_value())
- else:
- nnet_dict[dict_a] = array_2_string((1.0 - factor[i-1]) * layers[i].params[0].get_value())
- dict_a = str(i) + ' ' + activation + ' b'
- nnet_dict[dict_a] = array_2_string(layers[i].params[1].get_value())
-
- # gradients
- dict_a = str(i) + ' ' + activation + ' dW'
- nnet_dict[dict_a] = array_2_string(layers[i].delta_params[0].get_value())
- dict_a = str(i) + ' ' + activation + ' db'
- nnet_dict[dict_a] = array_2_string(layers[i].delta_params[1].get_value())
-
- if layers[i].kahan:
- logger.info("Loading hidden kahan")
- dict_a = str(i) + ' ' + activation + ' W_carry'
- nnet_dict[dict_a] = array_2_string(layers[i].params_carry[0].get_value())
- dict_a = str(i) + ' ' + activation + ' b_carry'
- nnet_dict[dict_a] = array_2_string(layers[i].params_carry[1].get_value())
- #dict_a = str(i) + ' ' + activation + ' dW_carry'
- #nnet_dict[dict_a] = array_2_string(layers[i].delta_params_carry[0].get_value())
- #dict_a = str(i) + ' ' + activation + ' db_carry'
- #nnet_dict[dict_a] = array_2_string(layers[i].delta_params_carry[1].get_value())
-
- if withfinal:
- logger.info("Saving final layer ")
-
- dict_a = 'logreg W'
- nnet_dict[dict_a] = array_2_string((1.0 - factor[-1]) * layers[-1].params[0].get_value())
- dict_a = 'logreg b'
- nnet_dict[dict_a] = array_2_string(layers[-1].params[1].get_value())
-
- #gradients
- dict_a = 'logreg dW'
- nnet_dict[dict_a] = array_2_string(layers[-1].delta_params[0].get_value())
- dict_a = 'logreg db'
- nnet_dict[dict_a] = array_2_string(layers[-1].delta_params[1].get_value())
-
- if layers[-1].kahan:
- logger.info("Loading softmax kahan")
- dict_a = 'logreg W_carry'
- nnet_dict[dict_a] = array_2_string(layers[-1].params_carry[0].get_value())
- dict_a = 'logreg b_carry'
- nnet_dict[dict_a] = array_2_string(layers[-1].params_carry[1].get_value())
- #dict_a = 'logreg dW_carry'
- #nnet_dict[dict_a] = array_2_string(layers[-1].delta_params_carry[0].get_value())
- #dict_a = 'logreg db_carry'
- #nnet_dict[dict_a] = array_2_string(layers[-1].delta_params_carry[1].get_value())
-
- utils.pickle_save(nnet_dict, filename)
-
-def zero(x):
- x.set_value(np.zeros_like(x.get_value(borrow=True), dtype=theano.config.floatX))
-
-def _file2nnet(layers, set_layer_num = -1, filename='nnet.in', activation='sigmoid', withfinal=True, factor=1.0, gradients=False, factors=None):
- logger = logging.getLogger(__name__)
- logger.info("Loading "+filename)
-
- # if is KALDI binary
- if fileIsBinary(filename):
- print("Warning dropout factors ignored here")
-
- nnet = file2nnet_binary(filename)
-
- n_layers = len(nnet)
- if set_layer_num == -1:
- set_layer_num = n_layers - 1
-
- for i in xrange(set_layer_num):
- layers[i].params[0].set_value(factor * nnet[i]["weights"].astype(dtype=theano.config.floatX))
- layers[i].params[1].set_value(nnet[i]["bias"].astype(dtype=theano.config.floatX))
-
- if withfinal:
- #print(nnet[-1]["weights"][0][0:10])
- layers[-1].params[0].set_value(nnet[-1]["weights"].astype(dtype=theano.config.floatX))
- layers[-1].params[1].set_value(nnet[-1]["bias"].astype(dtype=theano.config.floatX))
-
- return
-
- # else, it's pdnn format
-
- n_layers = len(layers)
-
- if factors is None:
- factors = [1.0 for l in layers]
-
- if len(factors) != n_layers:
- raise Exception("number of factors does not equal number of hidden + softmax")
-
- nnet_dict = {}
- if set_layer_num == -1:
- set_layer_num = n_layers - 1
-
- nnet_dict = utils.pickle_load(filename)
-
- for i in xrange(set_layer_num):
- logger.info("Loading hidden layer "+str(i))
-
- dict_key = str(i) + ' ' + activation + ' W'
- layers[i].params[0].set_value(factors[i] * factor * np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = str(i) + ' ' + activation + ' b'
- layers[i].params[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
-
- if gradients:
- dict_key = str(i) + ' ' + activation + ' dW'
- layers[i].delta_params[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = str(i) + ' ' + activation + ' db'
- layers[i].delta_params[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- else:
- zero(layers[i].delta_params[0])
- zero(layers[i].delta_params[1])
-
- dict_key = str(i) + ' ' + activation + ' W_carry'
- if layers[i].kahan and dict_key in nnet_dict:
- logger.info("Loading hidden kahan")
- dict_key = str(i) + ' ' + activation + ' W_carry'
- layers[i].params_carry[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = str(i) + ' ' + activation + ' b_carry'
- layers[i].params_carry[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- #dict_key = str(i) + ' ' + activation + ' dW_carry'
- #layers[i].delta_params_carry[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- #dict_key = str(i) + ' ' + activation + ' db_carry'
- #layers[i].delta_params_carry[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
-
- if layers[i].sync:
- layers[i].params_sync[0].set_value(layers[i].params[0].get_value().astype('float32'))
- layers[i].params_sync[1].set_value(layers[i].params[1].get_value().astype('float32'))
- logger.info("Copy params to sync")
-
- if withfinal:
- logger.info("Loading final layer ")
-
- dict_key = 'logreg W'
- layers[-1].params[0].set_value(factors[-1] * np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = 'logreg b'
- layers[-1].params[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- if gradients:
- dict_key = 'logreg dW'
- layers[-1].delta_params[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = 'logreg db'
- layers[-1].delta_params[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- else:
- zero(layers[-1].delta_params[0])
- zero(layers[-1].delta_params[1])
-
- dict_key = 'logreg W_carry'
- if layers[-1].kahan and dict_key in nnet_dict:
- logger.info("Loading softmax kahan")
- dict_key = 'logreg W_carry'
- layers[-1].params_carry[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- dict_key = 'logreg b_carry'
- layers[-1].params_carry[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- #dict_key = 'logreg dW_carry'
- #layers[-1].delta_params_carry[0].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
- #dict_key = 'logreg db_carry'
- #layers[-1].delta_params_carry[1].set_value(np.asarray(string_2_array(nnet_dict[dict_key]), dtype=theano.config.floatX))
-
- if layers[-1].sync:
- layers[-1].params_sync[0].set_value(layers[-1].params[0].get_value().astype('float32'))
- layers[-1].params_sync[1].set_value(layers[-1].params[1].get_value().astype('float32'))
- logger.info("Copy softmax params to sync")
-
- if gradients:
- logger.info("Loading gradients")
- else:
- logger.info("Zero-ing gradients")
-
-def _cnn2file(conv_layers, filename='nnet.out', activation='sigmoid', withfinal=True, input_factor = 1.0, factor=1.0):
- n_layers = len(conv_layers)
- nnet_dict = {}
- for i in xrange(n_layers):
- conv_layer = conv_layers[i]
- filter_shape = conv_layer.filter_shape
-
- for next_X in xrange(filter_shape[0]):
- for this_X in xrange(filter_shape[1]):
- dict_a = 'W ' + str(i) + ' ' + str(next_X) + ' ' + str(this_X)
- if i == 0:
- nnet_dict[dict_a] = array_2_string(input_factor * (conv_layer.W.get_value())[next_X, this_X])
- else:
- nnet_dict[dict_a] = array_2_string(factor * (conv_layer.W.get_value())[next_X, this_X])
-
- dict_a = 'b ' + str(i)
- nnet_dict[dict_a] = array_2_string(conv_layer.b.get_value())
-
- with open(filename, 'wb') as fp:
- json.dump(nnet_dict, fp, indent=2, sort_keys = True)
- fp.flush()
-
-def _file2cnn(conv_layers, filename='nnet.in', activation='sigmoid', withfinal=True, factor=1.0):
- n_layers = len(conv_layers)
- nnet_dict = {}
-
- with open(filename, 'rb') as fp:
- nnet_dict = json.load(fp)
- for i in xrange(n_layers):
- conv_layer = conv_layers[i]
- filter_shape = conv_layer.filter_shape
- W_array = conv_layer.W.get_value()
-
- for next_X in xrange(filter_shape[0]):
- for this_X in xrange(filter_shape[1]):
- dict_a = 'W ' + str(i) + ' ' + str(next_X) + ' ' + str(this_X)
- W_array[next_X, this_X, :, :] = factor * np.asarray(string_2_array(nnet_dict[dict_a]))
-
- conv_layer.W.set_value(W_array)
-
- dict_a = 'b ' + str(i)
- conv_layer.b.set_value(np.asarray(string_2_array(nnet_dict[dict_a]), dtype=theano.config.floatX))
diff --git a/example/speech-demo/io_func/regr_feat_io.py b/example/speech-demo/io_func/regr_feat_io.py
deleted file mode 100644
index a1737bf..0000000
--- a/example/speech-demo/io_func/regr_feat_io.py
+++ /dev/null
@@ -1,92 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import os
-import sys
-import random
-import shlex
-import time
-import re
-
-from utils.utils import to_bool
-from feat_readers.common import *
-from feat_readers import stats
-from feat_io import DataReadStream
-
-class RegrDataReadStream(object):
-
- def __init__(self, dataset_args, n_ins):
- dataset_args["has_labels"] = False
- assert("seed" in dataset_args)
-
- args1 = dict(dataset_args)
- args2 = dict(dataset_args)
-
- args1["lst_file"] = dataset_args["input_lst_file"]
- args2["lst_file"] = dataset_args["output_lst_file"]
-
- self.input = DataReadStream(args1, n_ins)
- self.output = DataReadStream(args2, n_ins)
-
- def read_by_part(self):
- self.input.read_by_part()
- self.output.read_by_part()
-
- def read_by_matrix(self):
- self.input.read_by_matrix()
- self.output.read_by_matrix()
-
- def make_shared(self):
- self.input.make_shared()
- self.output.make_shared()
-
- def get_shared(self):
- iret = self.input.get_shared()
- oret = self.output.get_shared()
- assert(iret[1] is None)
- assert(oret[1] is None)
- return iret[0], oret[0]
-
- def initialize_read(self):
- self.input.initialize_read()
- self.output.initialize_read()
-
- def current_utt_id(self):
- a = self.input.current_utt_id()
- b = self.output.current_utt_id()
- assert(a == b)
- return a
-
- def load_next_block(self):
- a = self.input.load_next_block()
- b = self.output.load_next_block()
- assert(a == b)
- return a
-
- def get_state(self):
- a = self.input.get_state()
- b = self.output.get_state()
- assert(a[0] == b[0])
- assert(a[2] == b[2])
- assert(a[3] == b[3])
- assert(a[4] == b[4])
- assert(numpy.array_equal(a[1], b[1]))
- return a
-
- def set_state(self, state):
- self.input.set_state(state)
- self.output.set_state(state)
diff --git a/example/speech-demo/io_func/utils.py b/example/speech-demo/io_func/utils.py
deleted file mode 100644
index 4ba8496..0000000
--- a/example/speech-demo/io_func/utils.py
+++ /dev/null
@@ -1,170 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import sys, subprocess, pickle, os, json, logging, socket
-import logging.config
-import datetime
-
-from . import info
-
-def getRunDir():
- return os.path.dirname(os.path.realpath(sys.argv[0]))
-
-def setup_logger(logging_ini):
- if logging_ini is not None:
- print("Using custom logger")
- else:
- logging_ini = os.path.join(info.CONFIGS, 'logging.ini')
-
- logging.config.fileConfig(logging_ini)
- logger = logging.getLogger(__name__)
- logger.info("**************************************************")
- logger.info(datetime.datetime.now().strftime("%Y-%m-%d %H:%M"))
- logger.info("Host: " + str(socket.gethostname()))
- logger.info("Screen: " + os.getenv("STY", "unknown"))
- logger.info("PWD: " + os.getenv("PWD", "unknown"))
- logger.info("Cmd: " + str(sys.argv))
- logger.info("**************************************************")
-
-def to_bool(obj):
- if str(obj).lower() in ["true", "1"]:
- return True
- elif str(obj).lower() in ["false", "0"]:
- return False
- else:
- raise Exception("to_bool: cannot convert to bool")
-
-def line_with_arg(line):
- line = line.strip()
- return line is not "" and not line.startswith("#")
-
-def parse_conv_spec(conv_spec, batch_size):
- # "1x29x29:100,5x5,p2x2:200,4x4,p2x2,f"
- conv_spec = conv_spec.replace('X', 'x')
- structure = conv_spec.split(':')
- conv_layer_configs = []
- for i in range(1, len(structure)):
- config = {}
- elements = structure[i].split(',')
- if i == 1:
- input_dims = structure[i - 1].split('x')
- prev_map_number = int(input_dims[0])
- prev_feat_dim_x = int(input_dims[1])
- prev_feat_dim_y = int(input_dims[2])
- else:
- prev_map_number = conv_layer_configs[-1]['output_shape'][1]
- prev_feat_dim_x = conv_layer_configs[-1]['output_shape'][2]
- prev_feat_dim_y = conv_layer_configs[-1]['output_shape'][3]
-
- current_map_number = int(elements[0])
- filter_xy = elements[1].split('x')
- filter_size_x = int(filter_xy[0])
- filter_size_y = int(filter_xy[1])
- pool_xy = elements[2].replace('p','').replace('P','').split('x')
- pool_size_x = int(pool_xy[0])
- pool_size_y = int(pool_xy[1])
- output_dim_x = (prev_feat_dim_x - filter_size_x + 1) / pool_size_x
- output_dim_y = (prev_feat_dim_y - filter_size_y + 1) / pool_size_y
-
- config['input_shape'] = (batch_size, prev_map_number, prev_feat_dim_x, prev_feat_dim_y)
- config['filter_shape'] = (current_map_number, prev_map_number, filter_size_x, filter_size_y)
- config['poolsize'] = (pool_size_x, pool_size_y)
- config['output_shape'] = (batch_size, current_map_number, output_dim_x, output_dim_y)
- if len(elements) == 4 and elements[3] == 'f':
- config['flatten'] = True
- else:
- config['flatten'] = False
-
- conv_layer_configs.append(config)
- return conv_layer_configs
-
-def _relu(x):
- return x * (x > 0)
-
-def _capped_relu(x):
- return T.minimum(x * (x > 0), 6)
-
-def _linear(x):
- return x * 1.0
-
-def parse_activation(act_str):
- print("***", act_str)
- if act_str == 'sigmoid':
- return T.nnet.sigmoid
- elif act_str == 'tanh':
- return T.tanh
- elif act_str == 'relu':
- return _relu
- elif act_str == 'capped_relu':
- return _capped_relu
- elif act_str == 'linear':
- return _linear
- return T.nnet.sigmoid
-
-def activation_to_txt(act_func):
- if act_func == T.nnet.sigmoid:
- return 'sigmoid'
- if act_func == T.tanh:
- return 'tanh'
-
-def parse_two_integers(argument_str):
- elements = argument_str.split(":")
- int_strs = elements[1].split(",")
- return int(int_strs[0]), int(int_strs[1])
-
-"""
-Usage:
- command = 'mysqladmin create test -uroot -pmysqladmin12'
- for line in run_command(command):
- print(line)
-"""
-def run_command(command):
- fnull = open(os.devnull, 'w')
- p = subprocess.Popen(command,
- stdout=subprocess.PIPE,
- stderr=fnull,
- shell=True)
- return p, iter(p.stdout.readline, b'')
-
-def pickle_load(filename):
- f = open(filename, "rb")
- try:
- obj = pickle.load(f)
- except Exception:
- f.close()
- f = open(filename, "rb")
- print("Not a pickled file... try to load as text format: " + filename)
- obj = json.load(f)
- f.close()
- return obj
-
-def pickle_save(obj, filename):
- f = open(filename + ".new", "wb")
- pickle.dump(obj, f)
- f.close()
- os.rename(filename + ".new", filename)
-
-def makedirs(path):
- if not os.path.exists(path):
- os.makedirs(path)
-
-def kahan_add(total, carry, inc):
- cs = T.add_no_assoc(carry, inc)
- s = T.add_no_assoc(total, cs)
- update_carry = T.sub(cs, T.sub(s, total))
- update_total = s
- return update_total, update_carry
diff --git a/example/speech-demo/io_util.py b/example/speech-demo/io_util.py
deleted file mode 100644
index e5bd74c..0000000
--- a/example/speech-demo/io_util.py
+++ /dev/null
@@ -1,671 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import mxnet as mx
-import numpy as np
-import sys
-from io_func.feat_io import DataReadStream
-
-# The interface of a data iter that works for bucketing
-#
-# DataIter
-# - default_bucket_key: the bucket key for the default symbol.
-#
-# DataBatch
-# - provide_data: same as DataIter, but specific to this batch
-# - provide_label: same as DataIter, but specific to this batch
-# - bucket_key: the key for the bucket that should be used for this batch
-
-
-def read_content(path):
- with open(path) as input:
- content = input.read()
- content = content.replace('\n', ' <eos> ').replace('. ', ' <eos> ')
- return content
-
-
-class SimpleBatch(object):
- def __init__(self, data_names, data, label_names, label, bucket_key,
- utt_id=None, utt_len=0, effective_sample_count=None):
- self.data = data
- self.label = label
- self.data_names = data_names
- self.label_names = label_names
- self.bucket_key = bucket_key
- self.utt_id = utt_id
- self.utt_len = utt_len
- self.effective_sample_count = effective_sample_count
-
- self.pad = 0
- self.index = None # TODO: what is index?
-
- @property
- def provide_data(self):
- return [(n, x.shape) for n, x in zip(self.data_names, self.data)]
-
- @property
- def provide_label(self):
- if len(self.label_names):
- return [(n, x.shape) for n, x in zip(self.label_names, self.label)]
- else:
- return None
-
-class SimpleIter(mx.io.DataIter):
- """DataIter used in Calculate Statistics (in progress).
-
- Parameters
- ----------
- pad_zeros : bool
- Default `False`. Control the behavior of padding when we run
- out of the whole dataset. When true, we will pad with all-zeros.
- When false, will pad with a random sentence in the dataset.
- Usually, for training we would like to use `False`, but
- for testing use `True` so that the evaluation metric can
- choose to ignore the padding by detecting the zero-labels.
- """
- def __init__(self, train_sets, batch_size,
- init_states, delay=5, feat_dim=40, label_dim=1955,
- label_mean_sets=None, data_name='data',
- label_name='softmax_label', has_label=True, load_label_mean=True):
-
- self.train_sets = train_sets
- self.label_mean_sets = label_mean_sets
- self.train_sets.initialize_read()
-
- self.data_name = data_name
- if has_label:
- self.label_name = label_name
-
- features = []
- labels = []
- utt_lens = []
- utt_ids = []
- buckets = []
- self.has_label = has_label
-
- if label_mean_sets is not None:
- self.label_mean_sets.initialize_read()
- (feats, tgts, utt_id) = self.label_mean_sets.load_next_seq()
-
- self.label_mean = feats/np.sum(feats)
- for i, v in enumerate(feats):
- if v <= 1.0:
- self.label_mean[i] = 1
-
- sys.stderr.write("Loading data...\n")
- buckets_map = {}
- n = 0
- while True:
- (feats, tgts, utt_id) = self.train_sets.load_next_seq()
- if utt_id is None:
- break
- if tgts is None and self.has_label:
- continue
- if feats.shape[0] == 0:
- continue
- features.append(feats)
- utt_lens.append(feats.shape[0])
- utt_ids.append(utt_id)
- if self.has_label:
- labels.append(tgts+1)
- if feats.shape[0] not in buckets:
- buckets_map[feats.shape[0]] = feats.shape[0]
-
- for k, v in buckets_map.iteritems():
- buckets.append(k)
-
- buckets.sort()
- i_max_bucket = len(buckets)-1
- max_bucket = buckets[i_max_bucket]
- self.buckets = buckets
- self.data = [[] for k in buckets]
- self.utt_id = [[] for k in buckets]
- self.utt_lens = [[] for k in buckets]
- self.feat_dim = feat_dim
- self.default_bucket_key = max(buckets)
-
- for i, feats in enumerate(features):
- if has_label:
- tgts = labels[i]
- utt_len = utt_lens[i]
- utt_id = utt_ids[i]
-
- for i, bkt in enumerate(buckets):
- if bkt >= utt_len:
- i_bucket = i
- break
-
- if self.has_label:
- self.data[i_bucket].append((feats, tgts))
- else:
- self.data[i_bucket].append(feats)
- self.utt_id[i_bucket].append(utt_id)
- self.utt_lens[i_bucket].append(utt_len)
-
- # Get the size of each bucket, so that we could sample
- # uniformly from the bucket
- bucket_sizes = [len(x) for x in self.data]
-
- self.batch_size = batch_size
- # convert data into ndarrays for better speed during training
-
- data = [np.zeros((len(x), buckets[i], self.feat_dim))
- if len(x) % self.batch_size == 0
- else np.zeros(((len(x)/self.batch_size + 1) * self.batch_size, buckets[i], self.feat_dim))
- for i, x in enumerate(self.data)]
-
- label = [np.zeros((len(x), buckets[i]))
- if len(x) % self.batch_size == 0
- else np.zeros(((len(x)/self.batch_size + 1) * self.batch_size, buckets[i]))
- for i, x in enumerate(self.data)]
-
- utt_id = [[] for k in buckets]
- for i, x in enumerate(data):
- utt_id[i] = ["GAP_UTT"] * len(x)
- utt_lens = [[] for k in buckets]
- for i, x in enumerate(data):
- utt_lens[i] = [0] * len(x)
-
-
- for i_bucket in range(len(self.buckets)):
- for j in range(len(self.data[i_bucket])):
- sentence = self.data[i_bucket][j]
- if self.has_label:
- sentence[1][delay:] = sentence[1][:-delay]
- sentence[1][:delay] = sentence[1][0] # broadcast assignment
- data[i_bucket][j, :len(sentence[0])] = sentence[0]
- label[i_bucket][j, :len(sentence[1])] = sentence[1]
- else:
- data[i_bucket][j, :len(sentence)] = sentence
- # borrow this place to pass in sentence length. TODO: use a less hacky way.
- label[i_bucket][j, :len(sentence)] += len(sentence)
-
- utt_id[i_bucket][j] = self.utt_id[i_bucket][j]
- utt_lens[i_bucket][j] = self.utt_lens[i_bucket][j]
-
- self.data = data
- self.label = label
- self.utt_id = utt_id
- self.utt_lens = utt_lens
-
-
- # Get the size of each bucket, so that we could sample
- # uniformly from the bucket
- bucket_sizes = [len(x) for x in self.data]
-
- sys.stderr.write("Summary of dataset ==================\n")
- for bkt, sz in zip(buckets, bucket_sizes):
- sys.stderr.write("bucket of len %3d : %d samples\n" % (bkt, sz))
-
- bucket_size_tot = float(sum(bucket_sizes))
-
- self.bucket_sizes = bucket_sizes
- self.make_data_iter_plan()
-
- self.init_states = init_states
- self.init_state_arrays = [mx.nd.zeros(x[1]) for x in init_states]
-
- self.provide_data = [(data_name, (batch_size, self.default_bucket_key, self.feat_dim))] + init_states
- self.provide_label = None
- if has_label:
- self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key))]
-
- def make_data_iter_plan(self):
- "make a random data iteration plan"
- # truncate each bucket into multiple of batch-size
- bucket_n_batches = []
- for i in range(len(self.data)):
- bucket_n_batches.append(len(self.data[i]) / self.batch_size)
- self.data[i] = self.data[i][:int(bucket_n_batches[i]*self.batch_size),:]
- self.label[i] = self.label[i][:int(bucket_n_batches[i]*self.batch_size)]
-
- bucket_plan = np.hstack([np.zeros(n, int)+i for i, n in enumerate(bucket_n_batches)])
- np.random.shuffle(bucket_plan)
-
- bucket_idx_all = [np.random.permutation(len(x)) for x in self.data]
-
- self.bucket_plan = bucket_plan
- self.bucket_idx_all = bucket_idx_all
- self.bucket_curr_idx = [0 for x in self.data]
-
- self.data_buffer = []
- self.label_buffer = []
- for i_bucket in range(len(self.data)):
- data = mx.nd.zeros((self.batch_size, self.buckets[i_bucket], self.feat_dim))
- label = mx.nd.zeros((self.batch_size, self.buckets[i_bucket]))
- self.data_buffer.append(data)
- self.label_buffer.append(label)
-
- def __iter__(self):
- init_state_names = [x[0] for x in self.init_states]
- data_names = [self.data_name] + init_state_names
- label_names = []
- if self.has_label:
- label_names = [self.label_name]
-
- for i_bucket in self.bucket_plan:
- data = self.data_buffer[i_bucket]
- label = self.label_buffer[i_bucket]
-
- i_idx = self.bucket_curr_idx[i_bucket]
- idx = self.bucket_idx_all[i_bucket][i_idx:i_idx+self.batch_size]
- self.bucket_curr_idx[i_bucket] += self.batch_size
- data[:] = self.data[i_bucket][idx]
- label[:] = self.label[i_bucket][idx]
- data_all = [data] + self.init_state_arrays
- label_all = [label]
- utt_id = np.array(self.utt_id[i_bucket])[idx]
- utt_len = np.array(self.utt_lens[i_bucket])[idx]
- effective_sample_count = mx.nd.sum(label)
- data_batch = SimpleBatch(data_names, data_all, label_names, label_all,
- self.buckets[i_bucket], utt_id, utt_len,
- effective_sample_count=effective_sample_count)
- yield data_batch
-
- def reset(self):
- self.bucket_curr_idx = [0 for x in self.data]
-
-class TruncatedSentenceIter(mx.io.DataIter):
- """DataIter used in Truncated-BPTT.
-
- Each sentence is split into chunks of fixed lengths. The states are
- forwarded during forward, but the backward is only computed within
- chunks. This mechanism does not require bucketing, and it sometimes
- avoid gradient exploding problems in very long sequences.
-
- Parameters
- ----------
- pad_zeros : bool
- Default `False`. Control the behavior of padding when we run
- out of the whole dataset. When true, we will pad with all-zeros.
- When false, will pad with a random sentence in the dataset.
- Usually, for training we would like to use `False`, but
- for testing use `True` so that the evaluation metric can
- choose to ignore the padding by detecting the zero-labels.
- """
- def __init__(self, train_sets, batch_size, init_states, truncate_len=20, delay=5,
- feat_dim=40, data_name='data', label_name='softmax_label',
- has_label=True, do_shuffling=True, pad_zeros=False, time_major=False):
-
- self.train_sets = train_sets
- self.train_sets.initialize_read()
-
- self.data_name = data_name
- self.label_name = label_name
-
- self.feat_dim = feat_dim
- self.has_label = has_label
- self.batch_size = batch_size
- self.truncate_len = truncate_len
- self.delay = delay
-
- self.do_shuffling = do_shuffling
- self.pad_zeros = pad_zeros
-
- self.time_major = time_major
-
- self.label = None
- if self.time_major:
- self.data = [mx.nd.zeros((truncate_len, batch_size, feat_dim))]
- if has_label:
- self.label = [mx.nd.zeros((truncate_len, batch_size))]
- else:
- self.data = [mx.nd.zeros((batch_size, truncate_len, feat_dim))]
- if has_label:
- self.label = [mx.nd.zeros((batch_size, truncate_len))]
-
- self.init_state_names = [x[0] for x in init_states]
- self.init_state_arrays = [mx.nd.zeros(x[1]) for x in init_states]
-
- self.provide_data = [(data_name, self.data[0].shape)] + init_states
- self.provide_label = None
- if has_label:
- self.provide_label = [(label_name, self.label[0].shape)]
-
- self._load_data()
- self._make_data_plan()
-
- def _load_data(self):
- sys.stderr.write('Loading data into memory...\n')
- self.features = []
- self.labels = []
- self.utt_ids = []
-
- seq_len_tot = 0.0
- while True:
- (feats, tgs, utt_id) = self.train_sets.load_next_seq()
- if utt_id is None:
- break
- if tgs is None and self.has_label:
- continue
- if feats.shape[0] == 0:
- continue
-
- if self.has_label and self.delay > 0:
- # delay the labels
- tgs[self.delay:] = tgs[:-self.delay]
- tgs[:self.delay] = tgs[0] # boradcast assign
- self.features.append(feats)
- if self.has_label:
- self.labels.append(tgs+1)
- self.utt_ids.append(utt_id)
- seq_len_tot += feats.shape[0]
-
- sys.stderr.write(' %d utterances loaded...\n' % len(self.utt_ids))
- sys.stderr.write(' avg-sequence-len = %.0f\n' % (seq_len_tot/len(self.utt_ids)))
-
- def _make_data_plan(self):
- if self.do_shuffling:
- # TODO: should we group utterances of similar length together?
- self._data_plan = np.random.permutation(len(self.features))
- else:
- # we might not want to do shuffling for testing for example
- self._data_plan = np.arange(len(self.features))
-
- def __iter__(self):
- assert len(self._data_plan) >= self.batch_size, \
- "Total number of sentences smaller than batch size, consider using smaller batch size"
- utt_idx = self._data_plan[:self.batch_size]
- utt_inside_idx = [0] * self.batch_size
-
- next_utt_idx = self.batch_size
- is_pad = [False] * self.batch_size
- pad = 0
-
- if self.time_major:
- np_data_buffer = np.zeros((self.truncate_len, self.batch_size, self.feat_dim))
- np_label_buffer = np.zeros((self.truncate_len, self.batch_size))
- else:
- np_data_buffer = np.zeros((self.batch_size, self.truncate_len, self.feat_dim))
- np_label_buffer = np.zeros((self.batch_size, self.truncate_len))
-
- utt_id_buffer = [None] * self.batch_size
-
- data_names = [self.data_name] + self.init_state_names
- label_names = [self.label_name]
-
- # reset states
- for state in self.init_state_arrays:
- state[:] = 0.1
-
- while True:
- effective_sample_count = self.batch_size * self.truncate_len
- for i, idx in enumerate(utt_idx):
- fea_utt = self.features[idx]
- if utt_inside_idx[i] >= fea_utt.shape[0]:
- # we have consumed this sentence
-
- # reset the states
- for state in self.init_state_arrays:
- if self.time_major:
- state[:, i:i+1, :] = 0.1
- else:
- state[i:i+1] = 0.1
- # load new sentence
- if is_pad[i]:
- # I am already a padded sentence, just rewind to the
- # beginning of the sentece
- utt_inside_idx[i] = 0
- elif next_utt_idx >= len(self.features):
- # we consumed the whole dataset, simply repeat this sentence
- # and set pad
- pad += 1
- is_pad[i] = True
- utt_inside_idx[i] = 0
- else:
- # move to the next sentence
- utt_idx[i] = self._data_plan[next_utt_idx]
- idx = utt_idx[i]
- fea_utt = self.features[idx]
- utt_inside_idx[i] = 0
- next_utt_idx += 1
-
- if is_pad[i] and self.pad_zeros:
- np_data_buffer[i] = 0
- np_label_buffer[i] = 0
- effective_sample_count -= self.truncate_len
- else:
- idx_take = slice(utt_inside_idx[i],
- min(utt_inside_idx[i]+self.truncate_len,
- fea_utt.shape[0]))
- n_take = idx_take.stop - idx_take.start
- if self.time_major:
- np_data_buffer[:n_take, i, :] = fea_utt[idx_take]
- np_label_buffer[:n_take, i] = self.labels[idx][idx_take]
- else:
- np_data_buffer[i, :n_take, :] = fea_utt[idx_take]
- np_label_buffer[i, :n_take] = self.labels[idx][idx_take]
-
- if n_take < self.truncate_len:
- if self.time_major:
- np_data_buffer[n_take:, i, :] = 0
- np_label_buffer[n_take:, i] = 0
- else:
- np_data_buffer[i, n_take:, :] = 0
- np_label_buffer[i, n_take:] = 0
-
- effective_sample_count -= self.truncate_len - n_take
-
- utt_inside_idx[i] += n_take
-
- utt_id_buffer[i] = self.utt_ids[idx]
-
- if pad == self.batch_size:
- # finished all the senteces
- break
-
- self.data[0][:] = np_data_buffer
- self.label[0][:] = np_label_buffer
-
- data_batch = SimpleBatch(data_names,
- self.data + self.init_state_arrays,
- label_names, self.label, bucket_key=None,
- utt_id=utt_id_buffer,
- effective_sample_count=effective_sample_count)
-
- # Instead of using the 'pad' property, we use an array 'is_pad'. Because
- # our padded sentence could be in the middle of a batch. A sample is pad
- # if we are running out of the data set and they are just some previously
- # seen data to be filled for a whole batch. In prediction, those data
- # should be ignored
- data_batch.is_pad = is_pad
-
- yield data_batch
-
- def reset(self):
- self._make_data_plan()
-
-
-class BucketSentenceIter(mx.io.DataIter):
- def __init__(self, train_sets, buckets, batch_size,
- init_states, delay=5, feat_dim=40,
- data_name='data', label_name='softmax_label', has_label=True):
-
- self.train_sets = train_sets
- self.train_sets.initialize_read()
-
- self.data_name = data_name
- self.label_name = label_name
-
- buckets.sort()
- i_max_bucket = len(buckets)-1
- max_bucket = buckets[i_max_bucket]
-
- if has_label != True:
- buckets = [i for i in range(1, max_bucket)]
- i_max_bucket = len(buckets)-1
- max_bucket = buckets[i_max_bucket]
-
- self.buckets = buckets
- self.data = [[] for k in buckets]
- self.utt_id = [[] for k in buckets]
- self.feat_dim = feat_dim
- self.default_bucket_key = max(buckets)
- self.has_label = has_label
-
- sys.stderr.write("Loading data...\n")
- T_OVERLAP = buckets[0]/2
- n = 0
- while True:
- (feats, tgts, utt_id) = self.train_sets.load_next_seq()
- if utt_id is None:
- break
- if tgts is None and self.has_label:
- continue
- if feats.shape[0] == 0:
- continue
-
- # we split sentence into overlapping segments if it is
- # longer than the largest bucket
- t_start = 0
- t_end = feats.shape[0]
- while t_start < t_end:
- if t_end - t_start > max_bucket:
- t_take = max_bucket
- i_bucket = i_max_bucket
- else:
- for i, bkt in enumerate(buckets):
- if bkt >= t_end-t_start:
- t_take = t_end-t_start
- i_bucket = i
- break
-
- n += 1
- if self.has_label:
- self.data[i_bucket].append((feats[t_start:t_start+t_take],
- tgts[t_start:t_start+t_take]+1))
- else:
- self.data[i_bucket].append(feats[t_start:t_start+t_take])
-
- self.utt_id[i_bucket].append(utt_id)
- t_start += t_take
- if t_start >= t_end:
- # this sentence is consumed
- break
- t_start -= T_OVERLAP
-
- # Get the size of each bucket, so that we could sample
- # uniformly from the bucket
- bucket_sizes = [len(x) for x in self.data]
-
- self.batch_size = batch_size
- # convert data into ndarrays for better speed during training
-
- data = [np.zeros((len(x), buckets[i], self.feat_dim))
- if len(x) % self.batch_size == 0
- else np.zeros(((len(x)/self.batch_size + 1) * self.batch_size, buckets[i],
- self.feat_dim))
- for i, x in enumerate(self.data)]
-
- label = [np.zeros((len(x), buckets[i]))
- if len(x) % self.batch_size == 0
- else np.zeros(((len(x)/self.batch_size + 1) * self.batch_size, buckets[i]))
- for i, x in enumerate(self.data)]
-
- utt_id = [[] for k in buckets]
- for i, x in enumerate(data):
- utt_id[i] = ["GAP_UTT"] * len(x)
-
- for i_bucket in range(len(self.buckets)):
- for j in range(len(self.data[i_bucket])):
- sentence = self.data[i_bucket][j]
- if self.has_label:
- sentence[1][delay:] = sentence[1][:-delay]
- sentence[1][:delay] = sentence[1][0] # broadcast assignment
- data[i_bucket][j, :len(sentence[0])] = sentence[0]
- label[i_bucket][j, :len(sentence[1])] = sentence[1]
- else:
- data[i_bucket][j, :len(sentence)] = sentence
- # borrow this place to pass in sentence length. TODO: use a less hacky way.
- label[i_bucket][j, :len(sentence)] += len(sentence)
-
- utt_id[i_bucket][j] = self.utt_id[i_bucket][j]
-
- self.data = data
- self.label = label
- self.utt_id = utt_id
-
- # Get the size of each bucket, so that we could sample
- # uniformly from the bucket
- bucket_sizes = [len(x) for x in self.data]
-
- sys.stderr.write("Summary of dataset ==================\n")
- for bkt, sz in zip(buckets, bucket_sizes):
- sys.stderr.write("bucket of len %3d : %d samples\n" % (bkt, sz))
-
- self.bucket_sizes = bucket_sizes
- self.make_data_iter_plan()
-
- self.init_states = init_states
- self.init_state_arrays = [mx.nd.zeros(x[1]) for x in init_states]
-
- self.provide_data = [(data_name, (batch_size, self.default_bucket_key, self.feat_dim))] + \
- init_states
- self.provide_label = [(label_name, (self.batch_size, self.default_bucket_key))]
-
- def make_data_iter_plan(self):
- "make a random data iteration plan"
- # truncate each bucket into multiple of batch-size
- bucket_n_batches = []
- for i in range(len(self.data)):
- bucket_n_batches.append(len(self.data[i]) / self.batch_size)
- self.data[i] = self.data[i][:int(bucket_n_batches[i]*self.batch_size), :]
- self.label[i] = self.label[i][:int(bucket_n_batches[i]*self.batch_size)]
-
- bucket_plan = np.hstack([np.zeros(n, int)+i for i, n in enumerate(bucket_n_batches)])
- np.random.shuffle(bucket_plan)
-
- bucket_idx_all = [np.random.permutation(len(x)) for x in self.data]
-
- self.bucket_plan = bucket_plan
- self.bucket_idx_all = bucket_idx_all
- self.bucket_curr_idx = [0 for x in self.data]
-
- self.data_buffer = []
- self.label_buffer = []
- for i_bucket in range(len(self.data)):
- data = mx.nd.zeros((self.batch_size, self.buckets[i_bucket], self.feat_dim))
- label = mx.nd.zeros((self.batch_size, self.buckets[i_bucket]))
- self.data_buffer.append(data)
- self.label_buffer.append(label)
-
- def __iter__(self):
- init_state_names = [x[0] for x in self.init_states]
- data_names = [self.data_name] + init_state_names
- label_names = [self.label_name]
-
- for i_bucket in self.bucket_plan:
- data = self.data_buffer[i_bucket]
- label = self.label_buffer[i_bucket]
-
- i_idx = self.bucket_curr_idx[i_bucket]
- idx = self.bucket_idx_all[i_bucket][i_idx:i_idx+self.batch_size]
- self.bucket_curr_idx[i_bucket] += self.batch_size
- data[:] = self.data[i_bucket][idx]
- label[:] = self.label[i_bucket][idx]
- data_all = [data] + self.init_state_arrays
- label_all = [label]
- utt_id = np.array(self.utt_id[i_bucket])[idx]
- effective_sample_count = mx.nd.sum(label)
- data_batch = SimpleBatch(data_names, data_all, label_names, label_all,
- self.buckets[i_bucket], utt_id,
- effective_sample_count=effective_sample_count)
- yield data_batch
-
- def reset(self):
- self.bucket_curr_idx = [0 for x in self.data]
-
diff --git a/example/speech-demo/lstm_proj.py b/example/speech-demo/lstm_proj.py
deleted file mode 100644
index a27518c..0000000
--- a/example/speech-demo/lstm_proj.py
+++ /dev/null
@@ -1,153 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint:skip-file
-import mxnet as mx
-import numpy as np
-from collections import namedtuple
-
-LSTMState = namedtuple("LSTMState", ["c", "h"])
-LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias",
- "h2h_weight", "h2h_bias",
- "ph2h_weight",
- "c2i_bias", "c2f_bias", "c2o_bias"])
-LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol",
- "init_states", "last_states",
- "seq_data", "seq_labels", "seq_outputs",
- "param_blocks"])
-
-def lstm(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0., num_hidden_proj=0):
- """LSTM Cell symbol"""
- if dropout > 0.:
- indata = mx.sym.Dropout(data=indata, p=dropout)
-
- i2h = mx.sym.FullyConnected(data=indata,
- weight=param.i2h_weight,
- bias=param.i2h_bias,
- num_hidden=num_hidden * 4,
- name="t%d_l%d_i2h" % (seqidx, layeridx))
- h2h = mx.sym.FullyConnected(data=prev_state.h,
- weight=param.h2h_weight,
- #bias=param.h2h_bias,
- no_bias=True,
- num_hidden=num_hidden * 4,
- name="t%d_l%d_h2h" % (seqidx, layeridx))
- gates = i2h + h2h
- slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
- name="t%d_l%d_slice" % (seqidx, layeridx))
-
- Wcidc = mx.sym.broadcast_mul(param.c2i_bias, prev_state.c) + slice_gates[0]
- in_gate = mx.sym.Activation(Wcidc, act_type="sigmoid")
- in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
-
- Wcfdc = mx.sym.broadcast_mul(param.c2f_bias, prev_state.c) + slice_gates[2]
- forget_gate = mx.sym.Activation(Wcfdc, act_type="sigmoid")
- next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
-
- Wcoct = mx.sym.broadcast_mul(param.c2o_bias, next_c) + slice_gates[3]
- out_gate = mx.sym.Activation(Wcoct, act_type="sigmoid")
-
- next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh")
-
- if num_hidden_proj > 0:
- proj_next_h = mx.sym.FullyConnected(data=next_h,
- weight=param.ph2h_weight,
- no_bias=True,
- num_hidden=num_hidden_proj,
- name="t%d_l%d_ph2h" % (seqidx, layeridx))
-
- return LSTMState(c=next_c, h=proj_next_h)
- else:
- return LSTMState(c=next_c, h=next_h)
-
-def lstm_unroll(num_lstm_layer, seq_len, input_size,
- num_hidden, num_label, dropout=0., output_states=False, take_softmax=True, num_hidden_proj=0):
-
- cls_weight = mx.sym.Variable("cls_weight")
- cls_bias = mx.sym.Variable("cls_bias")
- param_cells = []
- last_states = []
- for i in range(num_lstm_layer):
- param_cells.append(LSTMParam(i2h_weight = mx.sym.Variable("l%d_i2h_weight" % i),
- i2h_bias = mx.sym.Variable("l%d_i2h_bias" % i),
- h2h_weight = mx.sym.Variable("l%d_h2h_weight" % i),
- h2h_bias = mx.sym.Variable("l%d_h2h_bias" % i),
- ph2h_weight = mx.sym.Variable("l%d_ph2h_weight" % i),
- c2i_bias = mx.sym.Variable("l%d_c2i_bias" % i, shape=(1,num_hidden)),
- c2f_bias = mx.sym.Variable("l%d_c2f_bias" % i, shape=(1,num_hidden)),
- c2o_bias = mx.sym.Variable("l%d_c2o_bias" % i, shape=(1, num_hidden))
- ))
- state = LSTMState(c=mx.sym.Variable("l%d_init_c" % i),
- h=mx.sym.Variable("l%d_init_h" % i))
- last_states.append(state)
- assert(len(last_states) == num_lstm_layer)
-
- data = mx.sym.Variable('data')
- label = mx.sym.Variable('softmax_label')
-
- dataSlice = mx.sym.SliceChannel(data=data, num_outputs=seq_len, squeeze_axis=1)
-
- hidden_all = []
- for seqidx in range(seq_len):
- hidden = dataSlice[seqidx]
-
- # stack LSTM
- for i in range(num_lstm_layer):
- if i == 0:
- dp = 0.
- else:
- dp = dropout
- next_state = lstm(num_hidden, indata=hidden,
- prev_state=last_states[i],
- param=param_cells[i],
- seqidx=seqidx, layeridx=i, dropout=dp, num_hidden_proj=num_hidden_proj)
- hidden = next_state.h
- last_states[i] = next_state
- # decoder
- if dropout > 0.:
- hidden = mx.sym.Dropout(data=hidden, p=dropout)
- hidden_all.append(hidden)
-
- hidden_concat = mx.sym.Concat(*hidden_all, dim=1)
- if num_hidden_proj > 0:
- hidden_final = mx.sym.Reshape(hidden_concat, target_shape=(0, num_hidden_proj))
- else:
- hidden_final = mx.sym.Reshape(hidden_concat, target_shape=(0, num_hidden))
- pred = mx.sym.FullyConnected(data=hidden_final, num_hidden=num_label,
- weight=cls_weight, bias=cls_bias, name='pred')
- pred = mx.sym.Reshape(pred, shape=(-1, num_label))
- label = mx.sym.Reshape(label, shape=(-1,))
- if take_softmax:
- sm = mx.sym.SoftmaxOutput(data=pred, label=label, ignore_label=0,
- use_ignore=True, name='softmax')
- else:
- sm = pred
-
- if output_states:
- # block the gradients of output states
- for i in range(num_lstm_layer):
- state = last_states[i]
- state = LSTMState(c=mx.sym.BlockGrad(state.c, name="l%d_last_c" % i),
- h=mx.sym.BlockGrad(state.h, name="l%d_last_h" % i))
- last_states[i] = state
-
- # also output states, used in truncated-bptt to copy over states
- unpack_c = [state.c for state in last_states]
- unpack_h = [state.h for state in last_states]
- sm = mx.sym.Group([sm] + unpack_c + unpack_h)
-
- return sm
diff --git a/example/speech-demo/make_stats.py b/example/speech-demo/make_stats.py
deleted file mode 100644
index 64991db..0000000
--- a/example/speech-demo/make_stats.py
+++ /dev/null
@@ -1,103 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import re
-import sys
-sys.path.insert(0, "../../python")
-import time
-import logging
-import os.path
-
-import mxnet as mx
-import numpy as np
-
-from lstm_proj import lstm_unroll
-from io_util import BucketSentenceIter, TruncatedSentenceIter, SimpleIter, DataReadStream
-from config_util import parse_args, get_checkpoint_path, parse_contexts
-
-from io_func.feat_readers.writer_kaldi import KaldiWriteOut
-
-# some constants
-METHOD_BUCKETING = 'bucketing'
-METHOD_TBPTT = 'truncated-bptt'
-METHOD_SIMPLE = 'simple'
-
-
-def prepare_data(args):
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
-
- init_c = [('l%d_init_c' % l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
- init_h = [('l%d_init_h' % l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-
- init_states = init_c + init_h
-
- file_test = args.config.get('data', 'train')
-
- file_format = args.config.get('data', 'format')
- feat_dim = args.config.getint('data', 'xdim')
-
- test_data_args = {
- "gpu_chunk": 32768,
- "lst_file": file_test,
- "file_format": file_format,
- "separate_lines": True,
- "has_labels": True
- }
-
- test_sets = DataReadStream(test_data_args, feat_dim)
-
- return (init_states, test_sets)
-
-
-if __name__ == '__main__':
- args = parse_args()
- args.config.write(sys.stderr)
-
- decoding_method = args.config.get('train', 'method')
- contexts = parse_contexts(args)
-
- init_states, test_sets = prepare_data(args)
- state_names = [x[0] for x in init_states]
-
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
- feat_dim = args.config.getint('data', 'xdim')
- label_dim = args.config.getint('data', 'ydim')
- out_file = args.config.get('data', 'out_file')
- num_epoch = args.config.getint('train', 'num_epoch')
- model_name = get_checkpoint_path(args)
- logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(message)s')
-
- # load the model
- label_mean = np.zeros((label_dim,1), dtype='float32')
- data_test = TruncatedSentenceIter(test_sets, batch_size, init_states,
- 20, feat_dim=feat_dim,
- do_shuffling=False, pad_zeros=True, has_label=True)
-
- for i, batch in enumerate(data_test.labels):
- hist, edges = np.histogram(batch.flat, bins=range(0,label_dim+1))
- label_mean += hist.reshape(label_dim,1)
-
- kaldiWriter = KaldiWriteOut(None, out_file)
- kaldiWriter.open_or_fd()
- kaldiWriter.write("label_mean", label_mean)
-
-
- args.config.write(sys.stderr)
diff --git a/example/speech-demo/python_wrap/Makefile b/example/speech-demo/python_wrap/Makefile
deleted file mode 100644
index 2c020b0..0000000
--- a/example/speech-demo/python_wrap/Makefile
+++ /dev/null
@@ -1,13 +0,0 @@
-
-
-all:
-
-include ../kaldi.mk
-
-OBJFILES = ctypes.o
-
-LIBNAME = kaldi-python-wrap
-
-ADDLIBS = ../util/kaldi-util.a ../matrix/kaldi-matrix.a ../base/kaldi-base.a ../hmm/kaldi-hmm.a ../cudamatrix/kaldi-cudamatrix.a ../nnet/kaldi-nnet.a ../thread/kaldi-thread.a
-
-include ../makefiles/default_rules.mk
diff --git a/example/speech-demo/python_wrap/ctypes.cc b/example/speech-demo/python_wrap/ctypes.cc
deleted file mode 100644
index a2c7946..0000000
--- a/example/speech-demo/python_wrap/ctypes.cc
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-#include <iostream>
-
-#include "util/table-types.h"
-#include "hmm/posterior.h"
-#include "nnet/nnet-nnet.h"
-#include "cudamatrix/cu-device.h"
-
-class Foo{
- public:
- Foo() {
- x[0] = 0.5f;
- x[1] = 1.5f;
- x[2] = 2.5f;
- x[3] = 3.5f;
- x[4] = 4.5f;
- }
- void bar(){
- std::cout << "Hello" << std::endl;
- }
- float * getx() {
- return x;
- }
- int sizex() {
- return sizeof(x) / sizeof(float);
- }
- private:
- float x[5];
-};
-
-namespace kaldi {
- typedef SequentialBaseFloatMatrixReader SBFMReader;
- typedef Matrix<BaseFloat> MatrixF;
- typedef RandomAccessPosteriorReader RAPReader;
-
- namespace nnet1 {
- typedef class Nnet_t_ {
- public:
- Nnet nnet_transf;
- CuMatrix<BaseFloat> feats_transf;
- MatrixF buf;
- } Nnet_t;
- }
-}
-
-extern "C" {
-
- Foo* Foo_new(){ return new Foo(); }
- void Foo_bar(Foo* foo){ foo->bar(); }
- float * Foo_getx(Foo* foo) { return foo->getx(); }
- int Foo_sizex(Foo* foo) { return foo->sizex(); }
-
- using namespace kaldi;
- using namespace kaldi::nnet1;
-
- /****************************** SBFMReader ******************************/
-
- //SequentialTableReader(): impl_(NULL) { }
- SBFMReader* SBFMReader_new() {
- return new SBFMReader();
- }
- //SequentialTableReader(const std::string &rspecifier);
- SBFMReader* SBFMReader_new_char(char * rspecifier) {
- return new SBFMReader(rspecifier);
- }
- //bool Open(const std::string &rspecifier);
- int SBFMReader_Open(SBFMReader* r, char * rspecifier) {
- return r->Open(rspecifier);
- }
- //inline bool Done();
- int SBFMReader_Done(SBFMReader* r) {
- return r->Done();
- }
- //inline std::string Key();
- const char * SBFMReader_Key(SBFMReader* r) {
- return r->Key().c_str();
- }
- //void FreeCurrent();
- void SBFMReader_FreeCurrent(SBFMReader* r) {
- r->FreeCurrent();
- }
- //const T &Value();
- const MatrixF * SBFMReader_Value(SBFMReader* r) {
- return &r->Value(); //despite how dangerous this looks, this is safe because holder maintains object (it's not stack allocated)
- }
- //void Next();
- void SBFMReader_Next(SBFMReader* r) {
- r->Next();
- }
- //bool IsOpen() const;
- int SBFMReader_IsOpen(SBFMReader* r) {
- return r->IsOpen();
- }
- //bool Close();
- int SBFMReader_Close(SBFMReader* r) {
- return r->Close();
- }
- //~SequentialTableReader();
- void SBFMReader_Delete(SBFMReader* r) {
- delete r;
- }
-
- /****************************** MatrixF ******************************/
-
- //NumRows ()
- int MatrixF_NumRows(MatrixF *m) {
- return m->NumRows();
- }
- //NumCols ()
- int MatrixF_NumCols(MatrixF *m) {
- return m->NumCols();
- }
-
- //Stride ()
- int MatrixF_Stride(MatrixF *m) {
- return m->Stride();
- }
-
- void MatrixF_cpy_to_ptr(MatrixF *m, float * dst, int dst_stride) {
- int num_rows = m->NumRows();
- int num_cols = m->NumCols();
- int src_stride = m->Stride();
- int bytes_per_row = num_cols * sizeof(float);
-
- float * src = m->Data();
-
- for (int r=0; r<num_rows; r++) {
- memcpy(dst, src, bytes_per_row);
- src += src_stride;
- dst += dst_stride;
- }
- }
-
- //SizeInBytes ()
- int MatrixF_SizeInBytes(MatrixF *m) {
- return m->SizeInBytes();
- }
- //Data (), Real is usually float32
- const float * MatrixF_Data(MatrixF *m) {
- return m->Data();
- }
-
- /****************************** RAPReader ******************************/
-
- RAPReader* RAPReader_new_char(char * rspecifier) {
- return new RAPReader(rspecifier);
- }
-
- //bool HasKey (const std::string &key)
- int RAPReader_HasKey(RAPReader* r, char * key) {
- return r->HasKey(key);
- }
-
- //const T & Value (const std::string &key)
- int * RAPReader_Value(RAPReader* r, char * key) {
- //return &r->Value(key);
- const Posterior p = r->Value(key);
- int num_rows = p.size();
- if (num_rows == 0) {
- return NULL;
- }
-
- //std::cout << "num_rows " << num_rows << std::endl;
-
- int * vals = new int[num_rows];
-
- for (int row=0; row<num_rows; row++) {
- int num_cols = p.at(row).size();
- if (num_cols != 1) {
- std::cout << "num_cols != 1: " << num_cols << std::endl;
- delete vals;
- return NULL;
- }
- std::pair<int32, BaseFloat> pair = p.at(row).at(0);
- if (pair.second != 1) {
- std::cout << "pair.second != 1: " << pair.second << std::endl;
- delete vals;
- return NULL;
- }
- vals[row] = pair.first;
- }
-
- return vals;
- }
-
- void RAPReader_DeleteValue(RAPReader* r, int * vals) {
- delete vals;
- }
-
- //~RandomAccessTableReader ()
- void RAPReader_Delete(RAPReader* r) {
- delete r;
- }
-
- /****************************** Nnet_t ******************************/
-
- Nnet_t* Nnet_new(char * filename, float dropout_retention, int crossvalidate) {
- //std::cout << "dropout_retention " << dropout_retention << " crossvalidate " << crossvalidate << std::endl;
-
- Nnet_t * nnet = new Nnet_t();
-
- if(strcmp(filename, "") != 0) {
- nnet->nnet_transf.Read(filename);
- }
-
- if (dropout_retention > 0.0) {
- nnet->nnet_transf.SetDropoutRate(dropout_retention);
- }
- if (crossvalidate) {
- nnet->nnet_transf.SetDropoutRate(1.0);
- }
-
- return nnet;
- }
-
- const MatrixF * Nnet_Feedforward(Nnet_t* nnet, MatrixF * inputs) {
- nnet->nnet_transf.Feedforward(CuMatrix<BaseFloat>(*inputs), &nnet->feats_transf);
- nnet->buf.Resize(nnet->feats_transf.NumRows(), nnet->feats_transf.NumCols());
- nnet->feats_transf.CopyToMat(&nnet->buf);
- return &nnet->buf;
- }
-
- void Nnet_Delete(Nnet_t* nnet) {
- delete nnet;
- }
-}
diff --git a/example/speech-demo/python_wrap/example_usage/README.txt b/example/speech-demo/python_wrap/example_usage/README.txt
deleted file mode 100644
index 23fbb3d..0000000
--- a/example/speech-demo/python_wrap/example_usage/README.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-# If not already done, make sure kaldi/src is compiled as shared libraries
-cd kaldi/src
-./configure --shared
-make depend
-make
-
-# Copy python_wrap/ to kaldi/src and compile it
-cd python_wrap/
-make
-
-cd example_usage/
-# Add kaldi/src/lib to LD_LIBRARY_PATH
-export LD_LIBRARY_PATH=../../lib:$LD_LIBRARY_PATH
-python example.py
\ No newline at end of file
diff --git a/example/speech-demo/python_wrap/example_usage/data.ark b/example/speech-demo/python_wrap/example_usage/data.ark
deleted file mode 100644
index d4939db..0000000
Binary files a/example/speech-demo/python_wrap/example_usage/data.ark and /dev/null differ
diff --git a/example/speech-demo/python_wrap/example_usage/data.scp b/example/speech-demo/python_wrap/example_usage/data.scp
deleted file mode 100644
index 10589e8..0000000
--- a/example/speech-demo/python_wrap/example_usage/data.scp
+++ /dev/null
@@ -1 +0,0 @@
-test_feat data.ark:10
diff --git a/example/speech-demo/python_wrap/example_usage/data.txt b/example/speech-demo/python_wrap/example_usage/data.txt
deleted file mode 100644
index de5b46e..0000000
--- a/example/speech-demo/python_wrap/example_usage/data.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-test_feat [
- 1.2345 6.789
- -9.876 0.0001 ]
diff --git a/example/speech-demo/python_wrap/example_usage/example.py b/example/speech-demo/python_wrap/example_usage/example.py
deleted file mode 100644
index d930327..0000000
--- a/example/speech-demo/python_wrap/example_usage/example.py
+++ /dev/null
@@ -1,111 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-import ctypes
-import numpy
-
-c_float_ptr = ctypes.POINTER(ctypes.c_float)
-c_int_ptr = ctypes.POINTER(ctypes.c_int)
-c_void_p = ctypes.c_void_p
-c_int = ctypes.c_int
-c_char_p = ctypes.c_char_p
-c_float = ctypes.c_float
-
-kaldi = ctypes.cdll.LoadLibrary("libkaldi-python-wrap.so") # this needs to be in LD_LIBRARY_PATH
-
-def decl(f, restype, argtypes):
- f.restype = restype
- if argtypes is not None and len(argtypes) != 0:
- f.argtypes = argtypes
-
-decl(kaldi.Foo_new, c_void_p, [])
-decl(kaldi.Foo_bar, None, [c_void_p])
-decl(kaldi.Foo_getx, c_float_ptr, [c_void_p])
-decl(kaldi.Foo_sizex, c_int, [c_void_p])
-
-decl(kaldi.SBFMReader_new, c_void_p, [])
-decl(kaldi.SBFMReader_new_char, c_void_p, [c_char_p])
-decl(kaldi.SBFMReader_Open, c_int, [c_void_p, c_char_p])
-decl(kaldi.SBFMReader_Done, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Key, c_char_p, [c_void_p])
-decl(kaldi.SBFMReader_FreeCurrent, None, [c_void_p])
-decl(kaldi.SBFMReader_Value, c_void_p, [c_void_p])
-decl(kaldi.SBFMReader_Next, None, [c_void_p])
-decl(kaldi.SBFMReader_IsOpen, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Close, c_int, [c_void_p])
-decl(kaldi.SBFMReader_Delete, None, [c_void_p])
-
-decl(kaldi.MatrixF_NumRows, c_int, [c_void_p])
-decl(kaldi.MatrixF_NumCols, c_int, [c_void_p])
-decl(kaldi.MatrixF_Stride, c_int, [c_void_p])
-decl(kaldi.MatrixF_cpy_to_ptr, None, [c_void_p, c_float_ptr, c_int])
-decl(kaldi.MatrixF_SizeInBytes, c_int, [c_void_p])
-decl(kaldi.MatrixF_Data, c_float_ptr, [c_void_p])
-
-if __name__ == "__main__":
- print("-------- Foo class example --------")
- a = kaldi.Foo_new()
- print("Calling Foo_bar(): ",)
- kaldi.Foo_bar(a)
- print()
- print("Result of Foo_getx(): ", kaldi.Foo_getx(a))
- print("Result of Foo_sizex(): ", kaldi.Foo_sizex(a))
-
- print()
- print("-------- Kaldi SBFMReader and MatrixF class example --------")
-
- reader = kaldi.SBFMReader_new_char("scp:data.scp")
-
- # data.scp has exactly one utterance, assert it's there
- assert(not kaldi.SBFMReader_Done(reader))
-
- utt_id = kaldi.SBFMReader_Key(reader)
-
- feat_value = kaldi.SBFMReader_Value(reader)
- feat_rows = kaldi.MatrixF_NumRows(feat_value)
- feat_cols = kaldi.MatrixF_NumCols(feat_value)
- feat_data = kaldi.MatrixF_Data(feat_value)
-
- # never use numpy.ndarray(buf=) or numpy.ctypeslib.as_array
- # because you don't know if Python or C owns buffer
- # (even if you numpy.copy() resulting array)
- # http://stackoverflow.com/questions/4355524/getting-data-from-ctypes-array-into-numpy
- #
- # Can't use memmove/memcpy because arrays are strided
- # Use cpy_to_ptr
- feats = numpy.empty((feat_rows,feat_cols), dtype=numpy.float32)
-
- # MUST: cast Python int to pointer, otherwise C interprets as 32-bit
- # if you print the pointer value before casting, you might see weird value before seg fault
- # casting fixes that
- feats_numpy_ptr = ctypes.cast(feats.ctypes.data, c_float_ptr)
- kaldi.MatrixF_cpy_to_ptr(feat_value, feats_numpy_ptr, feats.strides[0]/4)
-
- print("Read utterance:")
- print(" ID: ", utt_id)
- print(" Rows: ", feat_rows)
- print(" Cols: ", feat_cols)
- print(" Value: ", feat_data)
- print(feats)
- print(" This should match data.txt")
-
- # assert no more utterances left
- kaldi.SBFMReader_Next(reader)
- assert(kaldi.SBFMReader_Done(reader))
-
- kaldi.SBFMReader_Delete(reader)
diff --git a/example/speech-demo/run_ami.sh b/example/speech-demo/run_ami.sh
deleted file mode 100755
index 0103fd1..0000000
--- a/example/speech-demo/run_ami.sh
+++ /dev/null
@@ -1,141 +0,0 @@
-#!/bin/bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-
-# This script trains and evaluate LSTM models. There is no
-# discriminative training yet.
-# In this recipe, MXNet directly read Kaldi features and labels,
-# which makes the whole pipline much simpler.
-
-set -e #Exit on non-zero return code from any command
-set -o pipefail #Exit if any of the commands in the pipeline will
- #return non-zero return code
-set -u #Fail on an undefined variable
-
-. ./cmd.sh
-. ./path.sh
-
-cmd=hostd3.pl
-# root folder,
-expdir=exp_mxnet
-
-##################################################
-# Kaldi generated folder
-##################################################
-
-# alignment folder
-ali_src=exp_cntk/sdm1/dnn_120fbank_ali
-
-# decoding graph
-graph_src=exp/sdm1/tri3a/graph_ami_fsh.o3g.kn.pr1-7/
-
-# features
-train_src=data/sdm1/train_fbank_gcmvn
-dev_src=data/sdm1/eval_fbank_gcmvn
-
-# config file
-config=ami_local_bptt.cfg
-
-# optional settings,
-njdec=128
-scoring="--min-lmwt 5 --max-lmwt 19"
-
-# The device number to run the training
-# change to AUTO to select the card automatically
-deviceNumber=gpu1
-
-# decoding method
-method=simple
-modelName=
-# model
-prefix=
-num_epoch=
-acwt=0.1
-#smbr training variables
-num_utts_per_iter=40
-smooth_factor=0.1
-use_one_sil=true
-
-stage=0
-. utils/parse_options.sh || exit 1;
-
-
-###############################################
-# Training
-###############################################
-
-mkdir -p $expdir
-dir=$expdir/data-for-mxnet
-
-# prepare listing data
-if [ $stage -le 0 ] ; then
- mkdir -p $dir
- mkdir -p $dir/log
- mkdir -p $dir/rawpost
-
- # for compressed ali
- #$cmd JOB=1:$njdec $dir/log/gen_post.JOB.log \
- # ali-to-pdf $ali_src/final.mdl "ark:gunzip -c $ali_src/ali.JOB.gz |" \
- # ark:- | ali-to-post ark:- ark,scp:$dir/rawpost/post.JOB.ark,$dir/rawpost/post.JOB.scp || exit 1;
- num=`cat $ali_src/num_jobs`
- $cmd JOB=1:$num $dir/log/gen_post.JOB.log \
- ali-to-pdf $ali_src/final.mdl ark:$ali_src/ali.JOB.ark \
- ark:- \| ali-to-post ark:- ark,scp:$dir/rawpost/post.JOB.ark,$dir/rawpost/post.JOB.scp || exit 1;
-
-
- for n in $(seq $num); do
- cat $dir/rawpost/post.${n}.scp || exit 1;
- done > $dir/post.scp
-fi
-
-if [ $stage -le 1 ] ; then
- # split the data : 90% train and 10% held-out
- [ ! -e ${train_src}_tr90 ] && utils/subset_data_dir_tr_cv.sh $train_src ${train_src}_tr90 ${train_src}_cv10
-
- # generate dataset list
- echo NO_FEATURE_TRANSFORM scp:${train_src}_tr90/feats.scp > $dir/train.feats
- echo scp:$dir/post.scp >> $dir/train.feats
-
- echo NO_FEATURE_TRANSFORM scp:${train_src}_cv10/feats.scp > $dir/dev.feats
- echo scp:$dir/post.scp >> $dir/dev.feats
-
- echo NO_FEATURE_TRANSFORM scp:${dev_src}/feats.scp > $dir/test.feats
-fi
-
-# generate label counts
-if [ $stage -le 2 ] ; then
- $cmd JOB=1:1 $dir/log/gen_label_mean.JOB.log \
- python make_stats.py --configfile $config --data_train $dir/train.feats \| copy-feats ark:- ark:$dir/label_mean.ark
- echo NO_FEATURE_TRANSFORM ark:$dir/label_mean.ark > $dir/label_mean.feats
-fi
-
-
-# training, note that weight decay is for the whole batch (0.00001 * 20 (minibatch) * 40 (batch_size))
-if [ $stage -le 3 ] ; then
- python train_lstm_proj.py --configfile $config --data_train $dir/train.feats --data_dev $dir/dev.feats --train_prefix $PWD/$expdir/$prefix --train_optimizer speechSGD --train_learning_rate 1 --train_context $deviceNumber --train_weight_decay 0.008 --train_show_every 1000
-fi
-
-# decoding
-if [ $stage -le 4 ] ; then
- cp $ali_src/final.mdl $expdir
- mxnet_string="OMP_NUM_THREADS=1 python decode_mxnet.py --config $config --data_test $dir/test.feats --data_label_mean $dir/label_mean.feats --train_method $method --train_prefix $PWD/$expdir/$prefix --train_num_epoch $num_epoch --train_context cpu0 --train_batch_size 1"
- ./decode_mxnet.sh --nj $njdec --cmd $decode_cmd --acwt $acwt --scoring-opts "$scoring" \
- $graph_src $dev_src $expdir/decode_${prefix}_$(basename $dev_src) "$mxnet_string" || exit 1;
-
-fi
diff --git a/example/speech-demo/run_timit.sh b/example/speech-demo/run_timit.sh
deleted file mode 100755
index 023ae6f..0000000
--- a/example/speech-demo/run_timit.sh
+++ /dev/null
@@ -1,141 +0,0 @@
-#!/bin/bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-
-# This script trains and evaluate LSTM models. There is no
-# discriminative training yet.
-# In this recipe, MXNet directly read Kaldi features and labels,
-# which makes the whole pipline much simpler.
-
-set -e #Exit on non-zero return code from any command
-set -o pipefail #Exit if any of the commands in the pipeline will
- #return non-zero return code
-set -u #Fail on an undefined variable
-
-. ./cmd.sh
-. ./path.sh
-
-cmd=run.pl
-# root folder,
-expdir=exp_timit
-
-##################################################
-# Kaldi generated folder
-##################################################
-
-# alignment folder
-ali_src=/home/sooda/speech/kaldi/egs/timit/s5/exp/tri3_ali
-
-# decoding graph
-graph_src=/home/sooda/speech/kaldi/egs/timit/s5/exp/tri3/graph
-
-# features
-train_src=/home/sooda/speech/kaldi/egs/timit/s5/data/train
-dev_src=/home/sooda/speech/kaldi/egs/timit/s5/data/dev
-
-# config file
-config=default_timit.cfg
-# optional settings,
-njdec=8
-scoring="--min-lmwt 5 --max-lmwt 19"
-
-# The device number to run the training
-# change to AUTO to select the card automatically
-deviceNumber=gpu0
-
-# decoding method
-method=simple
-modelName=
-# model
-prefix=timit
-num_epoch=12
-acwt=0.1
-#smbr training variables
-num_utts_per_iter=40
-smooth_factor=0.1
-use_one_sil=true
-
-stage=4
-. utils/parse_options.sh || exit 1;
-
-
-###############################################
-# Training
-###############################################
-
-mkdir -p $expdir
-dir=$expdir/data-for-mxnet
-
-# prepare listing data
-if [ $stage -le 0 ] ; then
- mkdir -p $dir
- mkdir -p $dir/log
- mkdir -p $dir/rawpost
-
- # for compressed ali
- num=`cat $ali_src/num_jobs`
- $cmd JOB=1:$num $dir/log/gen_post.JOB.log \
- ali-to-pdf $ali_src/final.mdl "ark:gunzip -c $ali_src/ali.JOB.gz |" \
- ark:- \| ali-to-post ark:- ark,scp:$dir/rawpost/post.JOB.ark,$dir/rawpost/post.JOB.scp || exit 1;
- #num=`cat $ali_src/num_jobs`
- #$cmd JOB=1:$num $dir/log/gen_post.JOB.log \
- # ali-to-pdf $ali_src/final.mdl ark:$ali_src/ali.JOB.ark \
- # ark:- \| ali-to-post ark:- ark,scp:$dir/rawpost/post.JOB.ark,$dir/rawpost/post.JOB.scp || exit 1;
-
-
- for n in $(seq $num); do
- cat $dir/rawpost/post.${n}.scp || exit 1;
- done > $dir/post.scp
-fi
-
-if [ $stage -le 1 ] ; then
- # split the data : 90% train and 10% held-out
- [ ! -e ${train_src}_tr90 ] && utils/subset_data_dir_tr_cv.sh $train_src ${train_src}_tr90 ${train_src}_cv10
-
- # generate dataset list
- echo NO_FEATURE_TRANSFORM scp:${train_src}_tr90/feats.scp > $dir/train.feats
- echo scp:$dir/post.scp >> $dir/train.feats
-
- echo NO_FEATURE_TRANSFORM scp:${train_src}_cv10/feats.scp > $dir/dev.feats
- echo scp:$dir/post.scp >> $dir/dev.feats
-
- echo NO_FEATURE_TRANSFORM scp:${dev_src}/feats.scp > $dir/test.feats
-fi
-
-# generate label counts
-if [ $stage -le 2 ] ; then
- $cmd JOB=1:1 $dir/log/gen_label_mean.JOB.log \
- python make_stats.py --configfile $config --data_train $dir/train.feats \| copy-feats ark:- ark:$dir/label_mean.ark
- echo NO_FEATURE_TRANSFORM ark:$dir/label_mean.ark > $dir/label_mean.feats
-fi
-
-
-# training, note that weight decay is for the whole batch (0.00001 * 20 (minibatch) * 40 (batch_size))
-if [ $stage -le 3 ] ; then
- python train_lstm_proj.py --configfile $config --data_train $dir/train.feats --data_dev $dir/dev.feats --train_prefix $PWD/$expdir/$prefix --train_optimizer speechSGD --train_learning_rate 1 --train_context $deviceNumber --train_weight_decay 0.008 --train_show_every 1000
-fi
-
-# decoding
-if [ $stage -le 4 ] ; then
- cp $ali_src/final.mdl $expdir
- mxnet_string="OMP_NUM_THREADS=1 python decode_mxnet.py --config $config --data_test $dir/test.feats --data_label_mean $dir/label_mean.feats --train_method $method --train_prefix $PWD/$expdir/$prefix --train_num_epoch $num_epoch --train_context cpu0 --train_batch_size 1"
- ./decode_mxnet.sh --nj $njdec --cmd $cmd --acwt $acwt --scoring-opts "$scoring" \
- $graph_src $dev_src $expdir/decode_${prefix}_$(basename $dev_src) "$mxnet_string" || exit 1;
-
-fi
diff --git a/example/speech-demo/speechSGD.py b/example/speech-demo/speechSGD.py
deleted file mode 100644
index 931f40a..0000000
--- a/example/speech-demo/speechSGD.py
+++ /dev/null
@@ -1,127 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import mxnet as mx
-
-from mxnet.ndarray import NDArray, zeros, clip, sqrt
-from mxnet.random import normal
-
-@mx.optimizer.register
-class speechSGD(mx.optimizer.Optimizer):
- """A very simple SGD optimizer with momentum and weight regularization.
-
- Parameters
- ----------
- learning_rate : float, optional
- learning_rate of SGD
-
- momentum : float, optional
- momentum value
-
- wd : float, optional
- L2 regularization coefficient add to all the weights
-
- rescale_grad : float, optional
- rescaling factor of gradient.
-
- clip_gradient : float, optional
- clip gradient in range [-clip_gradient, clip_gradient]
-
- param_idx2name : dict of string/int to float, optional
- special treat weight decay in parameter ends with bias, gamma, and beta
- """
- def __init__(self, momentum=0.0, **kwargs):
- super(speechSGD, self).__init__(**kwargs)
- self.momentum = momentum
-
- def create_state(self, index, weight):
- """Create additional optimizer state such as momentum.
-
- Parameters
- ----------
- weight : NDArray
- The weight data
-
- """
- if self.momentum == 0.0:
- return None
- else:
- return zeros(weight.shape, weight.context, dtype=weight.dtype)
-
- def _get_lr(self, index):
- """get learning rate for index.
-
- Parameters
- ----------
- index : int
- The index for weight
-
- Returns
- -------
- lr : float
- learning rate for this index
- """
- mom = 0.0
- if self.lr_scheduler is not None:
- (lr, mom) = self.lr_scheduler(self.num_update)
- else:
- lr = self.lr
-
- if index in self.lr_mult:
- lr *= self.lr_mult[index]
- elif index in self.idx2name:
- lr *= self.lr_mult.get(self.idx2name[index], 1.0)
- return lr, mom
-
- def update(self, index, weight, grad, state):
- """Update the parameters.
-
- Parameters
- ----------
- index : int
- An unique integer key used to index the parameters
-
- weight : NDArray
- weight ndarray
-
- grad : NDArray
- grad ndarray
-
- state : NDArray or other objects returned by init_state
- The auxiliary state used in optimization.
- """
- assert(isinstance(weight, NDArray))
- assert(isinstance(grad, NDArray))
- (lr, momentum) = self._get_lr(index)
- wd = self._get_wd(index)
- self._update_count(index)
-
- grad = grad * self.rescale_grad
- if self.clip_gradient is not None:
- grad = clip(grad, -self.clip_gradient, self.clip_gradient)
-
- if state:
- mom = state
- mom[:] *= momentum
- mom[:] += -lr * (1.0 - momentum) * (grad + wd * weight)
- weight[:] += mom
- else:
- assert self.momentum == 0.0
- weight[:] += -lr * (grad + self.wd * weight)
-
-
-
diff --git a/example/speech-demo/tests/test_nothing.py b/example/speech-demo/tests/test_nothing.py
deleted file mode 100644
index d6e810f..0000000
--- a/example/speech-demo/tests/test_nothing.py
+++ /dev/null
@@ -1,19 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-def test_nothing():
- pass
diff --git a/example/speech-demo/tests/test_system.py b/example/speech-demo/tests/test_system.py
deleted file mode 100644
index a64879a..0000000
--- a/example/speech-demo/tests/test_system.py
+++ /dev/null
@@ -1,109 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from __future__ import print_function
-from pdnn.run_DNN import run_DNN
-from pdnn.run_RBM import run_RBM
-from pdnn.run_SDA import run_SDA
-from pdnn.eval_DNN import eval_DNN
-import json
-from utils.utils import setup_logger
-
-MNIST_CONF = json.load(open("configs/unittest_mnist.json"))
-MAX_ITERS = 2
-setup_logger(None)
-
-def banner(s):
- print("***********************" + s + "*************************")
-
-def test_hi():
- print("hi")
-
-def test_rbm_dnn():
- banner("rbm dnn")
- mnist_conf = MNIST_CONF.copy()
-
- mnist_conf["train_rbm"]["max_iters"] = MAX_ITERS
- run_RBM(mnist_conf)
-
- mnist_conf["train_dnn"]["max_iters"] = MAX_ITERS
- mnist_conf["init_dnn"] = {
- "filename": "temp/rbm/final.nnet",
- "num_hidden_layers": -1,
- "with_final": 1
- }
- run_DNN(mnist_conf)
-
- mnist_conf["init_rbm"] = {
- "filename": "temp/dnn/final.nnet",
- "num_hidden_layers": -1,
- "with_final": 1
- }
- mnist_conf["train_rbm"]["max_iters"] = 0
- run_RBM(mnist_conf)
-
-def test_sda_dnn():
- banner("sda dnn")
- mnist_conf = MNIST_CONF.copy()
-
- mnist_conf["train_sda"]["max_iters"] = MAX_ITERS
- run_SDA(mnist_conf)
-
- mnist_conf["train_dnn"]["max_iters"] = MAX_ITERS
- mnist_conf["init_dnn"] = {
- "filename": "temp/sda/final.nnet",
- "num_hidden_layers": -1,
- "with_final": 1
- }
- run_DNN(mnist_conf)
-
- mnist_conf["init_sda"] = {
- "filename": "temp/dnn/final.nnet",
- "num_hidden_layers": -1,
- "with_final": 1
- }
- mnist_conf["train_sda"]["max_iters"] = 1
- run_SDA(mnist_conf)
-
-def test_dnn_eval():
- banner("dnn cv")
- mnist_conf = MNIST_CONF.copy()
-
- mnist_conf["train_dnn"]["max_iters"] = MAX_ITERS
- run_DNN(mnist_conf)
-
- mnist_conf["init_dnn"] = {
- "filename": "temp/dnn/final.nnet",
- "num_hidden_layers": -1,
- "with_final": 1
- }
-
- # per-part
- eval_DNN(mnist_conf)
-
- mnist_conf["eval_dnn"] = {"mode": "cv", "batch_size": 1024}
- eval_DNN(mnist_conf)
-
- mnist_conf["eval_dnn"] = {"mode": "per-feat", "batch_size": 1024}
- eval_DNN(mnist_conf)
-
-def test_dropout():
- banner("dropout")
- mnist_conf = MNIST_CONF.copy()
- mnist_conf["train_dnn"]["max_iters"] = MAX_ITERS
- mnist_conf["model"]["dropout_factor"] = "0.4"
- run_DNN(mnist_conf)
diff --git a/example/speech-demo/train_lstm_proj.py b/example/speech-demo/train_lstm_proj.py
deleted file mode 100644
index 5749b0c..0000000
--- a/example/speech-demo/train_lstm_proj.py
+++ /dev/null
@@ -1,327 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-import re
-import sys
-sys.path.insert(0, "../../python")
-import time
-import logging
-import os.path
-
-import mxnet as mx
-import numpy as np
-from speechSGD import speechSGD
-from lstm_proj import lstm_unroll
-from io_util import BucketSentenceIter, TruncatedSentenceIter, DataReadStream
-from config_util import parse_args, get_checkpoint_path, parse_contexts
-
-
-# some constants
-METHOD_BUCKETING = 'bucketing'
-METHOD_TBPTT = 'truncated-bptt'
-
-def prepare_data(args):
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_hidden_proj = args.config.getint('arch', 'num_hidden_proj')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
-
- init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
- if num_hidden_proj > 0:
- init_h = [('l%d_init_h'%l, (batch_size, num_hidden_proj)) for l in range(num_lstm_layer)]
- else:
- init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-
- init_states = init_c + init_h
-
- file_train = args.config.get('data', 'train')
- file_dev = args.config.get('data', 'dev')
- file_format = args.config.get('data', 'format')
- feat_dim = args.config.getint('data', 'xdim')
-
- train_data_args = {
- "gpu_chunk": 32768,
- "lst_file": file_train,
- "file_format": file_format,
- "separate_lines": True
- }
-
- dev_data_args = {
- "gpu_chunk": 32768,
- "lst_file": file_dev,
- "file_format": file_format,
- "separate_lines": True
- }
-
- train_sets = DataReadStream(train_data_args, feat_dim)
- dev_sets = DataReadStream(dev_data_args, feat_dim)
-
- return (init_states, train_sets, dev_sets)
-
-def CrossEntropy(labels, preds):
- labels = labels.reshape((-1,))
- preds = preds.reshape((-1, preds.shape[1]))
- loss = 0.
- num_inst = 0
- for i in range(preds.shape[0]):
- label = labels[i]
-
- if label > 0:
- loss += -np.log(max(1e-10, preds[i][int(label)]))
- num_inst += 1
- return loss , num_inst
-
-def Acc_exclude_padding(labels, preds):
- labels = labels.reshape((-1,))
- preds = preds.reshape((-1, preds.shape[1]))
- sum_metric = 0
- num_inst = 0
- for i in range(preds.shape[0]):
- pred_label = np.argmax(preds[i], axis=0)
- label = labels[i]
-
- ind = np.nonzero(label.flat)
- pred_label_real = pred_label.flat[ind]
- label_real = label.flat[ind]
- sum_metric += (pred_label_real == label_real).sum()
- num_inst += len(pred_label_real)
- return sum_metric, num_inst
-
-class SimpleLRScheduler(mx.lr_scheduler.LRScheduler):
- """A simple lr schedule that simply return `dynamic_lr`. We will set `dynamic_lr`
- dynamically based on performance on the validation set.
- """
- def __init__(self, dynamic_lr, effective_sample_count=1, momentum=0.9, optimizer="sgd"):
- super(SimpleLRScheduler, self).__init__()
- self.dynamic_lr = dynamic_lr
- self.effective_sample_count = effective_sample_count
- self.momentum = momentum
- self.optimizer = optimizer
-
- def __call__(self, num_update):
- if self.optimizer == "speechSGD":
- return self.dynamic_lr / self.effective_sample_count, self.momentum
- else:
- return self.dynamic_lr / self.effective_sample_count
-
-def score_with_state_forwarding(module, eval_data, eval_metric):
- eval_data.reset()
- eval_metric.reset()
-
- for eval_batch in eval_data:
- module.forward(eval_batch, is_train=False)
- module.update_metric(eval_metric, eval_batch.label)
-
- # copy over states
- outputs = module.get_outputs()
- # outputs[0] is softmax, 1:end are states
- for i in range(1, len(outputs)):
- outputs[i].copyto(eval_data.init_state_arrays[i-1])
-
-
-def get_initializer(args):
- init_type = getattr(mx.initializer, args.config.get('train', 'initializer'))
- init_scale = args.config.getfloat('train', 'init_scale')
- if init_type is mx.initializer.Xavier:
- return mx.initializer.Xavier(magnitude=init_scale)
- return init_type(init_scale)
-
-
-def do_training(training_method, args, module, data_train, data_val):
- from distutils.dir_util import mkpath
- mkpath(os.path.dirname(get_checkpoint_path(args)))
-
- batch_size = data_train.batch_size
- batch_end_callbacks = [mx.callback.Speedometer(batch_size,
- args.config.getint('train', 'show_every'))]
- eval_allow_extra = True if training_method == METHOD_TBPTT else False
- eval_metric = [mx.metric.np(CrossEntropy, allow_extra_outputs=eval_allow_extra),
- mx.metric.np(Acc_exclude_padding, allow_extra_outputs=eval_allow_extra)]
- eval_metric = mx.metric.create(eval_metric)
- optimizer = args.config.get('train', 'optimizer')
- momentum = args.config.getfloat('train', 'momentum')
- learning_rate = args.config.getfloat('train', 'learning_rate')
- lr_scheduler = SimpleLRScheduler(learning_rate, momentum=momentum, optimizer=optimizer)
-
- if training_method == METHOD_TBPTT:
- lr_scheduler.seq_len = data_train.truncate_len
-
- n_epoch = 0
- num_epoch = args.config.getint('train', 'num_epoch')
- learning_rate = args.config.getfloat('train', 'learning_rate')
- decay_factor = args.config.getfloat('train', 'decay_factor')
- decay_bound = args.config.getfloat('train', 'decay_lower_bound')
- clip_gradient = args.config.getfloat('train', 'clip_gradient')
- weight_decay = args.config.getfloat('train', 'weight_decay')
- if clip_gradient == 0:
- clip_gradient = None
-
- last_acc = -float("Inf")
- last_params = None
-
- module.bind(data_shapes=data_train.provide_data,
- label_shapes=data_train.provide_label,
- for_training=True)
- module.init_params(initializer=get_initializer(args))
-
- def reset_optimizer():
- if optimizer == "sgd" or optimizer == "speechSGD":
- module.init_optimizer(kvstore='device',
- optimizer=args.config.get('train', 'optimizer'),
- optimizer_params={'lr_scheduler': lr_scheduler,
- 'momentum': momentum,
- 'rescale_grad': 1.0,
- 'clip_gradient': clip_gradient,
- 'wd': weight_decay},
- force_init=True)
- else:
- module.init_optimizer(kvstore='device',
- optimizer=args.config.get('train', 'optimizer'),
- optimizer_params={'lr_scheduler': lr_scheduler,
- 'rescale_grad': 1.0,
- 'clip_gradient': clip_gradient,
- 'wd': weight_decay},
- force_init=True)
- reset_optimizer()
-
- while True:
- tic = time.time()
- eval_metric.reset()
-
- for nbatch, data_batch in enumerate(data_train):
- if training_method == METHOD_TBPTT:
- lr_scheduler.effective_sample_count = data_train.batch_size * truncate_len
- lr_scheduler.momentum = np.power(np.power(momentum, 1.0/(data_train.batch_size * truncate_len)), data_batch.effective_sample_count)
- else:
- if data_batch.effective_sample_count is not None:
- lr_scheduler.effective_sample_count = 1#data_batch.effective_sample_count
-
- module.forward_backward(data_batch)
- module.update()
- module.update_metric(eval_metric, data_batch.label)
-
- batch_end_params = mx.model.BatchEndParam(epoch=n_epoch, nbatch=nbatch,
- eval_metric=eval_metric,
- locals=None)
- for callback in batch_end_callbacks:
- callback(batch_end_params)
-
- if training_method == METHOD_TBPTT:
- # copy over states
- outputs = module.get_outputs()
- # outputs[0] is softmax, 1:end are states
- for i in range(1, len(outputs)):
- outputs[i].copyto(data_train.init_state_arrays[i-1])
-
- for name, val in eval_metric.get_name_value():
- logging.info('Epoch[%d] Train-%s=%f', n_epoch, name, val)
- toc = time.time()
- logging.info('Epoch[%d] Time cost=%.3f', n_epoch, toc-tic)
-
- data_train.reset()
-
- # test on eval data
- score_with_state_forwarding(module, data_val, eval_metric)
-
- # test whether we should decay learning rate
- curr_acc = None
- for name, val in eval_metric.get_name_value():
- logging.info("Epoch[%d] Dev-%s=%f", n_epoch, name, val)
- if name == 'CrossEntropy':
- curr_acc = val
- assert curr_acc is not None, 'cannot find Acc_exclude_padding in eval metric'
-
- if n_epoch > 0 and lr_scheduler.dynamic_lr > decay_bound and curr_acc > last_acc:
- logging.info('Epoch[%d] !!! Dev set performance drops, reverting this epoch',
- n_epoch)
- logging.info('Epoch[%d] !!! LR decay: %g => %g', n_epoch,
- lr_scheduler.dynamic_lr, lr_scheduler.dynamic_lr / float(decay_factor))
-
- lr_scheduler.dynamic_lr /= decay_factor
- # we reset the optimizer because the internal states (e.g. momentum)
- # might already be exploded, so we want to start from fresh
- reset_optimizer()
- module.set_params(*last_params)
- else:
- last_params = module.get_params()
- last_acc = curr_acc
- n_epoch += 1
-
- # save checkpoints
- mx.model.save_checkpoint(get_checkpoint_path(args), n_epoch,
- module.symbol, *last_params)
-
- if n_epoch == num_epoch:
- break
-
-if __name__ == '__main__':
- args = parse_args()
- args.config.write(sys.stdout)
-
- training_method = args.config.get('train', 'method')
- contexts = parse_contexts(args)
-
- init_states, train_sets, dev_sets = prepare_data(args)
- state_names = [x[0] for x in init_states]
-
- batch_size = args.config.getint('train', 'batch_size')
- num_hidden = args.config.getint('arch', 'num_hidden')
- num_hidden_proj = args.config.getint('arch', 'num_hidden_proj')
- num_lstm_layer = args.config.getint('arch', 'num_lstm_layer')
- feat_dim = args.config.getint('data', 'xdim')
- label_dim = args.config.getint('data', 'ydim')
-
- logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(message)s')
-
- if training_method == METHOD_BUCKETING:
- buckets = args.config.get('train', 'buckets')
- buckets = list(map(int, re.split(r'\W+', buckets)))
- data_train = BucketSentenceIter(train_sets, buckets, batch_size, init_states, feat_dim=feat_dim)
- data_val = BucketSentenceIter(dev_sets, buckets, batch_size, init_states, feat_dim=feat_dim)
-
- def sym_gen(seq_len):
- sym = lstm_unroll(num_lstm_layer, seq_len, feat_dim, num_hidden=num_hidden,
- num_label=label_dim, num_hidden_proj=num_hidden_proj)
- data_names = ['data'] + state_names
- label_names = ['softmax_label']
- return (sym, data_names, label_names)
-
- module = mx.mod.BucketingModule(sym_gen,
- default_bucket_key=data_train.default_bucket_key,
- context=contexts)
- do_training(training_method, args, module, data_train, data_val)
- elif training_method == METHOD_TBPTT:
- truncate_len = args.config.getint('train', 'truncate_len')
- data_train = TruncatedSentenceIter(train_sets, batch_size, init_states,
- truncate_len=truncate_len, feat_dim=feat_dim)
- data_val = TruncatedSentenceIter(dev_sets, batch_size, init_states,
- truncate_len=truncate_len, feat_dim=feat_dim,
- do_shuffling=False, pad_zeros=True)
- sym = lstm_unroll(num_lstm_layer, truncate_len, feat_dim, num_hidden=num_hidden,
- num_label=label_dim, output_states=True, num_hidden_proj=num_hidden_proj)
- data_names = [x[0] for x in data_train.provide_data]
- label_names = [x[0] for x in data_train.provide_label]
- module = mx.mod.Module(sym, context=contexts, data_names=data_names,
- label_names=label_names)
- do_training(training_method, args, module, data_train, data_val)
- else:
- raise RuntimeError('Unknown training method: %s' % training_method)
-
- print("="*80)
- print("Finished Training")
- print("="*80)
- args.config.write(sys.stdout)
--
To stop receiving notification emails like this one, please contact
['"commits@mxnet.apache.org" <co...@mxnet.apache.org>'].