You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by zh...@apache.org on 2018/06/28 21:45:24 UTC
[incubator-mxnet] branch master updated: Replace PTB dataset to Sherlock Holmes (#11435)

This is an automated email from the ASF dual-hosted git repository.

zhasheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
     new 52ce0c3  Replace PTB dataset to Sherlock Holmes  (#11435)
52ce0c3 is described below

commit 52ce0c3def1ea63d90e5ca1b104b99d9cd1e0c1a
Author: Alex Li <li...@gmail.com>
AuthorDate: Thu Jun 28 14:45:16 2018 -0700

    Replace PTB dataset to Sherlock Holmes  (#11435)
    
    * update rnn-time-major example
    
    * update rnn/bucketing example
    
    * update example/old
    
    * fix small bug in bucket.io.py (convert int to float)
    
    * fix parenthesis problem in bucket_io.py
    
    * update word_lm example
    
    * update perl example
    
    * update scala example
    
    * update readme
    
    * update sherlockholmes dataset license
    
    * fix the missed reference
    
    * fix more missed reference
    
    * one more reference cleaned
    
    * also modified typo pen treebank case
    
    * update buketing.md example info
    
    * remove ancient example
---
 docs/faq/bucketing.md                              |   4 +-
 docs/model_zoo/index.md                            |   1 +
 example/model-parallel/lstm/README.md              |  13 -
 example/model-parallel/lstm/lstm.py                | 532 ---------------------
 example/model-parallel/lstm/lstm_ptb.py            | 128 -----
 ...{get_ptb_data.sh => get_sherlockholmes_data.sh} |   8 +-
 example/rnn-time-major/readme.md                   |  12 +-
 example/rnn-time-major/rnn_cell_demo.py            |   8 +-
 example/rnn/README.md                              |   2 +-
 example/rnn/bucketing/README.md                    |  10 +-
 example/rnn/bucketing/cudnn_rnn_bucketing.py       |   6 +-
 ...{get_ptb_data.sh => get_sherlockholmes_data.sh} |   8 +-
 example/rnn/bucketing/lstm_bucketing.py            |  10 +-
 example/rnn/old/README.md                          |   6 +-
 example/rnn/old/bucket_io.py                       |   2 +-
 ...{get_ptb_data.sh => get_sherlockholmes_data.sh} |   8 +-
 example/rnn/old/gru_bucketing.py                   |   8 +-
 example/rnn/old/lstm_bucketing.py                  |   8 +-
 .../rnn/old/{lstm_ptb.R => lstm_sherlockholmes.R}  |   8 +-
 example/rnn/old/rnn_cell_demo.py                   |   8 +-
 example/rnn/word_lm/README.md                      |   6 +-
 example/rnn/word_lm/get_ptb_data.sh                |  43 --
 .../word_lm/get_sherlockholmes_data.sh}            |   8 +-
 example/rnn/word_lm/train.py                       |   4 +-
 .../AI-MXNet/examples/cudnn_lstm_bucketing.pl      |   4 +-
 perl-package/AI-MXNet/examples/get_ptb_data.sh     |  43 --
 .../AI-MXNet/examples/get_sherlockholmes_data.sh   |   8 +-
 perl-package/AI-MXNet/examples/lstm_bucketing.pl   |   6 +-
 .../AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm      |   4 +-
 .../apache/mxnetexamples/rnn/LstmBucketing.scala   |   4 +-
 30 files changed, 81 insertions(+), 839 deletions(-)

diff --git a/docs/faq/bucketing.md b/docs/faq/bucketing.md
index 6bcf80f..dbfdedd 100644
--- a/docs/faq/bucketing.md
+++ b/docs/faq/bucketing.md
@@ -31,9 +31,9 @@ by maintaining the connection of the states and gradients through time.
 However, this implementation approach results in slow processing.
 This approach works with variable length sequences. For more complicated models (e.g., translation that uses a sequence-to-sequence model), explicitly unrolling is the easiest way. In this example, we introduce the MXNet APIs that allows us to implement bucketing.
 
-## Variable-length Sequence Training for PTB
+## Variable-length Sequence Training for Sherlock Holmes
 
-We use the [PennTreeBank language model example](https://github.com/dmlc/mxnet/tree/master/example/rnn) for this example. If you are not familiar with this example, see [this tutorial (in Julia)](http://dmlc.ml/mxnet/2015/11/15/char-lstm-in-julia.html) first.
+We use the [Sherlock Holmes language model example](https://github.com/dmlc/mxnet/tree/master/example/rnn) for this example. If you are not familiar with this example, see [this tutorial (in Julia)](http://dmlc.ml/mxnet/2015/11/15/char-lstm-in-julia.html) first.
 
 In this example, we use a simple architecture
 consisting of a word-embedding layer
diff --git a/docs/model_zoo/index.md b/docs/model_zoo/index.md
index 19811f2..f7d1479 100644
--- a/docs/model_zoo/index.md
+++ b/docs/model_zoo/index.md
@@ -56,6 +56,7 @@ For instructions on using these models, see [the python tutorial on using pre-tr
 MXNet supports many types of recurrent neural networks (RNNs), including Long Short-Term Memory ([LSTM](http://www.bioinf.jku.at/publications/older/2604.pdf))
 and Gated Recurrent Units (GRU) networks. Some available datasets include:
 
+* [Sherlock Holmes](http://www.gutenberg.org/cache/epub/1661/pg1661.txt): Text corpus with ~1 million words.The task is predicting downstream words/characters.
 * [Penn Treebank (PTB)](https://catalog.ldc.upenn.edu/LDC95T7): Text corpus with ~1 million words. Vocabulary is limited to 10,000 words. The task is predicting downstream words/characters.
 * [Shakespeare](http://cs.stanford.edu/people/karpathy/char-rnn/): Complete text from Shakespeare's works.
 * [IMDB reviews](https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3): 25,000 movie reviews, labeled as positive or negative
diff --git a/example/model-parallel/lstm/README.md b/example/model-parallel/lstm/README.md
deleted file mode 100644
index 6f31ff8..0000000
--- a/example/model-parallel/lstm/README.md
+++ /dev/null
@@ -1,13 +0,0 @@
-Model Parallel LSTM
-===================
-
-This is an example showing how to do model parallel LSTM in MXNet.
-
-We use [the PenTreeBank dataset](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/)
-in this example. Download the dataset with below command:
-
-`bash get_ptb_data.sh`
-
-This will download PenTreeBank dataset under `data` folder. Now, you can run the training as follows:
-
-`python lstm_ptb.py`
diff --git a/example/model-parallel/lstm/lstm.py b/example/model-parallel/lstm/lstm.py
deleted file mode 100644
index 75fa533..0000000
--- a/example/model-parallel/lstm/lstm.py
+++ /dev/null
@@ -1,532 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint:skip-file
-import sys
-sys.path.insert(0, "../../python")
-import mxnet as mx
-import numpy as np
-from collections import namedtuple
-import time
-import math
-LSTMState = namedtuple("LSTMState", ["c", "h"])
-LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias",
-                                     "h2h_weight", "h2h_bias"])
-LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol",
-                                     "init_states", "last_states",
-                                     "seq_data", "seq_labels", "seq_outputs",
-                                     "param_blocks"])
-
-def lstm(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.):
-    """LSTM Cell symbol"""
-    if dropout > 0.:
-        indata = mx.sym.Dropout(data=indata, p=dropout)
-    i2h = mx.sym.FullyConnected(data=indata,
-                                weight=param.i2h_weight,
-                                bias=param.i2h_bias,
-                                num_hidden=num_hidden * 4,
-                                name="t%d_l%d_i2h" % (seqidx, layeridx))
-    h2h = mx.sym.FullyConnected(data=prev_state.h,
-                                weight=param.h2h_weight,
-                                bias=param.h2h_bias,
-                                num_hidden=num_hidden * 4,
-                                name="t%d_l%d_h2h" % (seqidx, layeridx))
-    gates = i2h + h2h
-    slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
-                                      name="t%d_l%d_slice" % (seqidx, layeridx))
-    in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
-    in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
-    forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid")
-    out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid")
-    next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
-    next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh")
-    return LSTMState(c=next_c, h=next_h)
-
-
-def lstm_unroll(num_lstm_layer, seq_len, input_size,
-                num_hidden, num_embed, num_label, dropout=0.,
-                concat_decode=True, use_loss=False):
-    """unrolled lstm network"""
-    # initialize the parameter symbols
-    with mx.AttrScope(ctx_group='embed'):
-        embed_weight=mx.sym.Variable("embed_weight")
-
-    with mx.AttrScope(ctx_group='decode'):
-        cls_weight = mx.sym.Variable("cls_weight")
-        cls_bias = mx.sym.Variable("cls_bias")
-
-    param_cells = []
-    last_states = []
-    for i in range(num_lstm_layer):
-        with mx.AttrScope(ctx_group='layer%d' % i):
-            param_cells.append(LSTMParam(i2h_weight = mx.sym.Variable("l%d_i2h_weight" % i),
-                                         i2h_bias = mx.sym.Variable("l%d_i2h_bias" % i),
-                                         h2h_weight = mx.sym.Variable("l%d_h2h_weight" % i),
-                                         h2h_bias = mx.sym.Variable("l%d_h2h_bias" % i)))
-            state = LSTMState(c=mx.sym.Variable("l%d_init_c" % i),
-                              h=mx.sym.Variable("l%d_init_h" % i))
-        last_states.append(state)
-    assert(len(last_states) == num_lstm_layer)
-
-    last_hidden = []
-    for seqidx in range(seq_len):
-        # embedding layer
-        with mx.AttrScope(ctx_group='embed'):
-            data = mx.sym.Variable("t%d_data" % seqidx)
-            hidden = mx.sym.Embedding(data=data, weight=embed_weight,
-                                      input_dim=input_size,
-                                      output_dim=num_embed,
-                                      name="t%d_embed" % seqidx)
-        # stack LSTM
-        for i in range(num_lstm_layer):
-            if i==0:
-                dp=0.
-            else:
-                dp = dropout
-            with mx.AttrScope(ctx_group='layer%d' % i):
-                next_state = lstm(num_hidden, indata=hidden,
-                                  prev_state=last_states[i],
-                                  param=param_cells[i],
-                                  seqidx=seqidx, layeridx=i, dropout=dp)
-                hidden = next_state.h
-                last_states[i] = next_state
-
-        # decoder
-        if dropout > 0.:
-            hidden = mx.sym.Dropout(data=hidden, p=dropout)
-        last_hidden.append(hidden)
-
-    out_prob = []
-    if not concat_decode:
-        for seqidx in range(seq_len):
-            with mx.AttrScope(ctx_group='decode'):
-                fc = mx.sym.FullyConnected(data=last_hidden[seqidx],
-                                           weight=cls_weight,
-                                           bias=cls_bias,
-                                           num_hidden=num_label,
-                                           name="t%d_cls" % seqidx)
-                label = mx.sym.Variable("t%d_label" % seqidx)
-                if use_loss:
-                    # Currently softmax_cross_entropy fails https://github.com/apache/incubator-mxnet/issues/6874
-                    # So, workaround for now to fix this example
-                    out = mx.symbol.softmax(data=fc)
-                    label = mx.sym.Reshape(label, shape=(-1,1))
-                    ce = - mx.sym.broadcast_add(mx.sym.broadcast_mul(label, mx.sym.log(out)),
-                                              mx.sym.broadcast_mul((1 - label), mx.sym.log(1 - out)))
-                    sm = mx.sym.MakeLoss(ce,  name="t%d_sm" % seqidx)
-                else:
-                    sm = mx.sym.SoftmaxOutput(data=fc, label=label, name="t%d_sm" % seqidx)
-                out_prob.append(sm)
-    else:
-        with mx.AttrScope(ctx_group='decode'):
-            concat = mx.sym.Concat(*last_hidden, dim = 0)
-            fc = mx.sym.FullyConnected(data=concat,
-                                       weight=cls_weight,
-                                       bias=cls_bias,
-                                       num_hidden=num_label)
-            label = mx.sym.Variable("label")
-            if use_loss:
-                # Currently softmax_cross_entropy fails https://github.com/apache/incubator-mxnet/issues/6874
-                # So, workaround for now to fix this example
-                out = mx.symbol.softmax(data=fc)
-                label = mx.sym.Reshape(label, shape=(-1, 1))
-                ce = mx.sym.broadcast_add(mx.sym.broadcast_mul(label, mx.sym.log(out)),
-                                              mx.sym.broadcast_mul((1 - label), mx.sym.log(1 - out)))
-                sm = mx.sym.MakeLoss(ce,  name="sm")
-            else:
-                sm = mx.sym.SoftmaxOutput(data=fc, label=label, name="sm")
-            out_prob = [sm]
-
-    for i in range(num_lstm_layer):
-        state = last_states[i]
-        state = LSTMState(c=mx.sym.BlockGrad(state.c, name="l%d_last_c" % i),
-                          h=mx.sym.BlockGrad(state.h, name="l%d_last_h" % i))
-        last_states[i] = state
-
-    unpack_c = [state.c for state in last_states]
-    unpack_h = [state.h for state in last_states]
-    list_all = out_prob + unpack_c + unpack_h
-    return mx.sym.Group(list_all)
-
-
-def is_param_name(name):
-    return name.endswith("weight") or name.endswith("bias") or\
-        name.endswith("gamma") or name.endswith("beta")
-
-
-def setup_rnn_model(default_ctx,
-                    num_lstm_layer, seq_len,
-                    num_hidden, num_embed, num_label,
-                    batch_size, input_size,
-                    initializer, dropout=0.,
-                    group2ctx=None, concat_decode=True,
-                    use_loss=False, buckets=None):
-    """set up rnn model with lstm cells"""
-    max_len = max(buckets)
-    max_rnn_exec = None
-    models = {}
-    buckets.reverse()
-    for bucket_key in buckets:
-        # bind max_len first
-        rnn_sym = lstm_unroll(num_lstm_layer=num_lstm_layer,
-                          num_hidden=num_hidden,
-                          seq_len=seq_len,
-                          input_size=input_size,
-                          num_embed=num_embed,
-                          num_label=num_label,
-                          dropout=dropout,
-                          concat_decode=concat_decode,
-                          use_loss=use_loss)
-        arg_names = rnn_sym.list_arguments()
-        internals = rnn_sym.get_internals()
-
-        input_shapes = {}
-        for name in arg_names:
-            if name.endswith("init_c") or name.endswith("init_h"):
-                input_shapes[name] = (batch_size, num_hidden)
-            elif name.endswith("data"):
-                input_shapes[name] = (batch_size, )
-            elif name == "label":
-                input_shapes[name] = (batch_size * seq_len, )
-            elif name.endswith("label"):
-                input_shapes[name] = (batch_size, )
-            else:
-                pass
-
-        arg_shape, out_shape, aux_shape = rnn_sym.infer_shape(**input_shapes)
-        # bind arrays
-        arg_arrays = []
-        args_grad = {}
-        for shape, name in zip(arg_shape, arg_names):
-            group = internals[name].attr("__ctx_group__")
-            ctx = group2ctx[group] if group is not None else default_ctx
-            arg_arrays.append(mx.nd.zeros(shape, ctx))
-            if is_param_name(name):
-                args_grad[name] = mx.nd.zeros(shape, ctx)
-            if not name.startswith("t"):
-                print("%s group=%s, ctx=%s" % (name, group, str(ctx)))
-
-        # bind with shared executor
-        rnn_exec = None
-        if max_len == bucket_key:
-              rnn_exec = rnn_sym.bind(default_ctx, args=arg_arrays,
-                                args_grad=args_grad,
-                                grad_req="add", group2ctx=group2ctx)
-              max_rnn_exec = rnn_exec
-        else:
-              assert max_rnn_exec is not None
-              rnn_exec = rnn_sym.bind(default_ctx, args=arg_arrays,
-                            args_grad=args_grad,
-                            grad_req="add", group2ctx=group2ctx,
-                            shared_exec = max_rnn_exec)
-
-        param_blocks = []
-        arg_dict = dict(zip(arg_names, rnn_exec.arg_arrays))
-        for i, name in enumerate(arg_names):
-            if is_param_name(name):
-                initializer(name, arg_dict[name])
-                param_blocks.append((i, arg_dict[name], args_grad[name], name))
-            else:
-                assert name not in args_grad
-
-        out_dict = dict(zip(rnn_sym.list_outputs(), rnn_exec.outputs))
-
-        init_states = [LSTMState(c=arg_dict["l%d_init_c" % i],
-                             h=arg_dict["l%d_init_h" % i]) for i in range(num_lstm_layer)]
-
-        seq_data = [rnn_exec.arg_dict["t%d_data" % i] for i in range(seq_len)]
-        # we don't need to store the last state
-        last_states = None
-
-        if concat_decode:
-            seq_outputs = [out_dict["sm_output"]]
-            seq_labels = [rnn_exec.arg_dict["label"]]
-        else:
-            seq_outputs = [out_dict["t%d_sm_output" % i] for i in range(seq_len)]
-            seq_labels = [rnn_exec.arg_dict["t%d_label" % i] for i in range(seq_len)]
-
-        model = LSTMModel(rnn_exec=rnn_exec, symbol=rnn_sym,
-                     init_states=init_states, last_states=last_states,
-                     seq_data=seq_data, seq_labels=seq_labels, seq_outputs=seq_outputs,
-                     param_blocks=param_blocks)
-        models[bucket_key] = model
-    buckets.reverse()
-    return models
-
-
-def set_rnn_inputs(m, X, begin):
-    seq_len = len(m.seq_data)
-    batch_size = m.seq_data[0].shape[0]
-    for seqidx in range(seq_len):
-        idx = (begin + seqidx) % X.shape[0]
-        next_idx = (begin + seqidx + 1) % X.shape[0]
-        x = X[idx, :]
-        y = X[next_idx, :]
-        mx.nd.array(x).copyto(m.seq_data[seqidx])
-        if len(m.seq_labels) == 1:
-            m.seq_labels[0][seqidx*batch_size : seqidx*batch_size+batch_size] = y
-        else:
-            m.seq_labels[seqidx][:] = y
-
-def set_rnn_inputs_from_batch(m, batch, batch_seq_length, batch_size):
-  X = batch.data
-  for seqidx in range(batch_seq_length):
-    idx = seqidx
-    next_idx = (seqidx + 1) % batch_seq_length
-    x = X[idx, :]
-    y = X[next_idx, :]
-    mx.nd.array(x).copyto(m.seq_data[seqidx])
-    if len(m.seq_labels) == 1:
-      m.seq_labels[0][seqidx*batch_size : seqidx*batch_size+batch_size] = y
-    else:
-      m.seq_labels[seqidx][:] = y
-
-def calc_nll_concat(seq_label_probs, batch_size):
-  return -np.sum(np.log(seq_label_probs.asnumpy())) / batch_size
-
-
-def calc_nll(seq_label_probs, batch_size, seq_len):
-  eps = 1e-10
-  nll = 0.
-  for seqidx in range(seq_len):
-    py = seq_label_probs[seqidx].asnumpy()
-    nll += -np.sum(np.log(np.maximum(py, eps))) / batch_size
-    return nll
-
-
-def train_lstm(model, X_train_batch, X_val_batch,
-               num_round, update_period, concat_decode, batch_size, use_loss,
-               optimizer='sgd', half_life=2,max_grad_norm = 5.0, **kwargs):
-    opt = mx.optimizer.create(optimizer,
-                              **kwargs)
-
-    updater = mx.optimizer.get_updater(opt)
-    epoch_counter = 0
-    #log_period = max(1000 / seq_len, 1)
-    log_period = 28
-    last_perp = 10000000.0
-
-    for iteration in range(num_round):
-        nbatch = 0
-        train_nll = 0
-        tic = time.time()
-        for data_batch in X_train_batch:
-            batch_seq_length = data_batch.bucket_key
-            m = model[batch_seq_length]
-            # reset init state
-            for state in m.init_states:
-              state.c[:] = 0.0
-              state.h[:] = 0.0
-
-            head_grad = []
-            if use_loss:
-              ctx = m.seq_outputs[0].context
-              head_grad = [mx.nd.ones((1,), ctx) for x in m.seq_outputs]
-
-            set_rnn_inputs_from_batch(m, data_batch, batch_seq_length, batch_size)
-
-            m.rnn_exec.forward(is_train=True)
-            # probability of each label class, used to evaluate nll
-            # Change back to individual ops to see if fine grained scheduling helps.
-            if not use_loss:
-                if concat_decode:
-                    seq_label_probs = mx.nd.choose_element_0index(m.seq_outputs[0], m.seq_labels[0])
-                else:
-                    seq_label_probs = [mx.nd.choose_element_0index(out, label).copyto(mx.cpu())
-                                       for out, label in zip(m.seq_outputs, m.seq_labels)]
-                m.rnn_exec.backward()
-            else:
-                seq_loss = [x.copyto(mx.cpu()) for x in m.seq_outputs]
-                m.rnn_exec.backward(head_grad)
-
-            # update epoch counter
-            epoch_counter += 1
-            if epoch_counter % update_period == 0:
-                # update parameters
-                norm = 0.
-                for idx, weight, grad, name in m.param_blocks:
-                    grad /= batch_size
-                    l2_norm = mx.nd.norm(grad).asscalar()
-                    norm += l2_norm*l2_norm
-                norm = math.sqrt(norm)
-                for idx, weight, grad, name in m.param_blocks:
-                    if norm > max_grad_norm:
-                        grad *= (max_grad_norm / norm)
-                    updater(idx, grad, weight)
-                    # reset gradient to zero
-                    grad[:] = 0.0
-            if not use_loss:
-                if concat_decode:
-                    train_nll += calc_nll_concat(seq_label_probs, batch_size)
-                else:
-                    train_nll += calc_nll(seq_label_probs, batch_size, batch_seq_length)
-            else:
-                train_nll += sum([x.sum().asscalar() for x in seq_loss]) / batch_size
-
-            nbatch += batch_size
-            toc = time.time()
-            if epoch_counter % log_period == 0:
-                print("Iter [%d] Train: Time: %.3f sec, NLL=%.3f, Perp=%.3f" % (
-                    epoch_counter, toc - tic, train_nll / nbatch, np.exp(train_nll / nbatch)))
-        # end of training loop
-        toc = time.time()
-        print("Iter [%d] Train: Time: %.3f sec, NLL=%.3f, Perp=%.3f" % (
-            iteration, toc - tic, train_nll / nbatch, np.exp(train_nll / nbatch)))
-
-        val_nll = 0.0
-        nbatch = 0
-        for data_batch in X_val_batch:
-            batch_seq_length = data_batch.bucket_key
-            m = model[batch_seq_length]
-
-            # validation set, reset states
-            for state in m.init_states:
-                state.h[:] = 0.0
-                state.c[:] = 0.0
-
-            set_rnn_inputs_from_batch(m, data_batch, batch_seq_length, batch_size)
-            m.rnn_exec.forward(is_train=False)
-
-            # probability of each label class, used to evaluate nll
-            if not use_loss:
-                if concat_decode:
-                    seq_label_probs = mx.nd.choose_element_0index(m.seq_outputs[0], m.seq_labels[0])
-                else:
-                    seq_label_probs = [mx.nd.choose_element_0index(out, label).copyto(mx.cpu())
-                                       for out, label in zip(m.seq_outputs, m.seq_labels)]
-            else:
-                seq_loss = [x.copyto(mx.cpu()) for x in m.seq_outputs]
-
-            if not use_loss:
-                if concat_decode:
-                    val_nll += calc_nll_concat(seq_label_probs, batch_size)
-                else:
-                    val_nll += calc_nll(seq_label_probs, batch_size, batch_seq_length)
-            else:
-                val_nll += sum([x.sum().asscalar() for x in seq_loss]) / batch_size
-            nbatch += batch_size
-
-        perp = np.exp(val_nll / nbatch)
-        print("Iter [%d] Val: NLL=%.3f, Perp=%.3f" % (
-            iteration, val_nll / nbatch, np.exp(val_nll / nbatch)))
-        if last_perp - 1.0 < perp:
-            opt.lr *= 0.5
-            print("Reset learning rate to %g" % opt.lr)
-        last_perp = perp
-        X_val_batch.reset()
-        X_train_batch.reset()
-
-# is this function being used?
-def setup_rnn_sample_model(ctx,
-                           params,
-                           num_lstm_layer,
-                           num_hidden, num_embed, num_label,
-                           batch_size, input_size):
-    seq_len = 1
-    rnn_sym = lstm_unroll(num_lstm_layer=num_lstm_layer,
-                          input_size=input_size,
-                          num_hidden=num_hidden,
-                          seq_len=seq_len,
-                          num_embed=num_embed,
-                          num_label=num_label)
-    arg_names = rnn_sym.list_arguments()
-    input_shapes = {}
-    for name in arg_names:
-        if name.endswith("init_c") or name.endswith("init_h"):
-            input_shapes[name] = (batch_size, num_hidden)
-        elif name.endswith("data"):
-            input_shapes[name] = (batch_size, )
-        else:
-            pass
-    arg_shape, out_shape, aux_shape = rnn_sym.infer_shape(**input_shapes)
-    arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
-    arg_dict = dict(zip(arg_names, arg_arrays))
-    for name, arr in params.items():
-        arg_dict[name][:] = arr
-    rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays, args_grad=None, grad_req="null")
-    out_dict = dict(zip(rnn_sym.list_outputs(), rnn_exec.outputs))
-    param_blocks = []
-    params_array = list(params.items())
-    for i in range(len(params)):
-        param_blocks.append((i, params_array[i][1], None, params_array[i][0]))
-    init_states = [LSTMState(c=arg_dict["l%d_init_c" % i],
-                             h=arg_dict["l%d_init_h" % i]) for i in range(num_lstm_layer)]
-
-    if concat_decode:
-        seq_labels = [rnn_exec.arg_dict["label"]]
-        seq_outputs = [out_dict["sm_output"]]
-    else:
-        seq_labels = [rnn_exec.arg_dict["t%d_label" % i] for i in range(seq_len)]
-        seq_outputs = [out_dict["t%d_sm" % i] for i in range(seq_len)]
-
-    seq_data = [rnn_exec.arg_dict["t%d_data" % i] for i in range(seq_len)]
-    last_states = [LSTMState(c=out_dict["l%d_last_c_output" % i],
-                             h=out_dict["l%d_last_h_output" % i]) for i in range(num_lstm_layer)]
-
-    return LSTMModel(rnn_exec=rnn_exec, symbol=rnn_sym,
-                     init_states=init_states, last_states=last_states,
-                     seq_data=seq_data, seq_labels=seq_labels, seq_outputs=seq_outputs,
-                     param_blocks=param_blocks)
-
-# Python3 np.random.choice is too strict in eval float probability so we use an alternative
-import random
-import bisect
-import collections
-
-def _cdf(weights):
-    total = sum(weights)
-    result = []
-    cumsum = 0
-    for w in weights:
-        cumsum += w
-        result.append(cumsum / total)
-    return result
-
-def _choice(population, weights):
-    assert len(population) == len(weights)
-    cdf_vals = _cdf(weights)
-    x = random.random()
-    idx = bisect.bisect(cdf_vals, x)
-    return population[idx]
-
-def sample_lstm(model, X_input_batch, seq_len, temperature=1., sample=True):
-    m = model
-    vocab = m.seq_outputs.shape[1]
-    batch_size = m.seq_data[0].shape[0]
-    outputs_ndarray = mx.nd.zeros(m.seq_outputs.shape)
-    outputs_batch = []
-    tmp = [i for i in range(vocab)]
-    for i in range(seq_len):
-        outputs_batch.append(np.zeros(X_input_batch.shape))
-    for i in range(seq_len):
-        set_rnn_inputs(m, X_input_batch, 0)
-        m.rnn_exec.forward(is_train=False)
-        outputs_ndarray[:] = m.seq_outputs
-        for init, last in zip(m.init_states, m.last_states):
-            last.c.copyto(init.c)
-            last.h.copyto(init.h)
-        prob = np.clip(outputs_ndarray.asnumpy(), 1e-6, 1 - 1e-6)
-        if sample:
-            rescale = np.exp(np.log(prob) / temperature)
-            for j in range(batch_size):
-                p = rescale[j, :]
-                p[:] /= p.sum()
-                outputs_batch[i][j] = _choice(tmp, p)
-        else:
-            outputs_batch[i][:] = np.argmax(prob, axis=1)
-        X_input_batch[:] = outputs_batch[i]
-    return outputs_batch
diff --git a/example/model-parallel/lstm/lstm_ptb.py b/example/model-parallel/lstm/lstm_ptb.py
deleted file mode 100644
index 965ba19..0000000
--- a/example/model-parallel/lstm/lstm_ptb.py
+++ /dev/null
@@ -1,128 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint:skip-file
-import lstm
-import sys
-sys.path.insert(0, "../../python")
-import mxnet as mx
-import numpy as np
-# reuse the bucket_io library
-sys.path.insert(0, "../../rnn/old")
-from bucket_io import BucketSentenceIter, default_build_vocab
-
-"""
-PennTreeBank Language Model
-We would like to thanks Wojciech Zaremba for his Torch LSTM code
-
-The data file can be found at:
-https://github.com/dmlc/web-data/tree/master/mxnet/ptb
-"""
-
-def load_data(path, dic=None):
-    fi = open(path)
-    content = fi.read()
-    content = content.replace('\n', '<eos>')
-    content = content.split(' ')
-    print("Loading %s, size of data = %d" % (path, len(content)))
-    x = np.zeros(len(content))
-    if dic is None:
-        dic = {}
-    idx = 0
-    for i in range(len(content)):
-        word = content[i]
-        if len(word) == 0:
-            continue
-        if not word in dic:
-            dic[word] = idx
-            idx += 1
-        x[i] = dic[word]
-    print("Unique token: %d" % len(dic))
-    return x, dic
-
-def drop_tail(X, seq_len):
-    shape = X.shape
-    nstep = int(shape[0] / seq_len)
-    return X[0:(nstep * seq_len), :]
-
-
-def replicate_data(x, batch_size):
-    nbatch = int(x.shape[0] / batch_size)
-    x_cut = x[:nbatch * batch_size]
-    data = x_cut.reshape((nbatch, batch_size), order='F')
-    return data
-
-batch_size = 20
-seq_len = 35
-num_hidden = 400
-num_embed = 200
-num_lstm_layer = 8
-num_round = 25
-learning_rate= 0.1
-wd=0.
-momentum=0.0
-max_grad_norm = 5.0
-update_period = 1
-
-dic = default_build_vocab("./data/ptb.train.txt")
-vocab = len(dic)
-
-# static buckets
-buckets = [8, 16, 24, 32, 60]
-
-init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-init_states = init_c + init_h
-
-X_train_batch = BucketSentenceIter("./data/ptb.train.txt", dic,
-                                        buckets, batch_size, init_states, model_parallel=True)
-X_val_batch = BucketSentenceIter("./data/ptb.valid.txt", dic,
-                                      buckets, batch_size, init_states, model_parallel=True)
-
-ngpu = 2
-# A simple two GPU placement plan
-group2ctx = {'embed': mx.gpu(0),
-             'decode': mx.gpu(ngpu - 1)}
-
-for i in range(num_lstm_layer):
-    group2ctx['layer%d' % i] = mx.gpu(i * ngpu // num_lstm_layer)
-
-# whether do group-wise concat
-concat_decode = False
-use_loss=True
-model = lstm.setup_rnn_model(mx.gpu(), group2ctx=group2ctx,
-                             concat_decode=concat_decode,
-                             use_loss=use_loss,
-                             num_lstm_layer=num_lstm_layer,
-                             seq_len=X_train_batch.default_bucket_key,
-                             num_hidden=num_hidden,
-                             num_embed=num_embed,
-                             num_label=vocab,
-                             batch_size=batch_size,
-                             input_size=vocab,
-                             initializer=mx.initializer.Uniform(0.1),dropout=0.5, buckets=buckets)
-
-lstm.train_lstm(model, X_train_batch, X_val_batch,
-                num_round=num_round,
-                concat_decode=concat_decode,
-                use_loss=use_loss,
-                half_life=2,
-                max_grad_norm = max_grad_norm,
-                update_period=update_period,
-                learning_rate=learning_rate,
-                batch_size = batch_size,
-                wd=wd)
diff --git a/example/rnn-time-major/get_ptb_data.sh b/example/rnn-time-major/get_sherlockholmes_data.sh
similarity index 85%
copy from example/rnn-time-major/get_ptb_data.sh
copy to example/rnn-time-major/get_sherlockholmes_data.sh
index 2dc4034..43c8669 100755
--- a/example/rnn-time-major/get_ptb_data.sh
+++ b/example/rnn-time-major/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn-time-major/readme.md b/example/rnn-time-major/readme.md
index 7983fe8..b30b841 100644
--- a/example/rnn-time-major/readme.md
+++ b/example/rnn-time-major/readme.md
@@ -7,18 +7,18 @@ As example of Batch-major RNN is available in MXNet [RNN Bucketing example](http
 	
 ## Running the example
 - Prerequisite: an instance with GPU compute resources is required to run MXNet RNN
-- Make the shell script ```get_ptb_data.sh``` executable:
+- Make the shell script ```get_sherlockholmes_data.sh``` executable:
     ```bash 
-    chmod +x get_ptb_data.sh
+    chmod +x get_sherlockholmes_data.sh
     ```
-- Run ```get_ptb_data.sh``` to download the PTB dataset, and follow the instructions to review the license:
+- Run ```get_sherlockholmes_data.sh``` to download the sherlockholmes dataset, and follow the instructions to review the license:
     ```bash
-    ./get_ptb_data.sh
+    ./get_sherlockholmes_data.sh
     ```
-    The PTB data sets will be downloaded into ./data directory, and available for the example to train on.
+    The sherlockholmes data sets will be downloaded into ./data directory, and available for the example to train on.
 - Run the example:
     ```bash
-    python python rnn_cell_demo.py
+    python rnn_cell_demo.py
     ```
     
     If everything goes well, console will plot training speed and perplexity that you can compare to the batch major RNN.
diff --git a/example/rnn-time-major/rnn_cell_demo.py b/example/rnn-time-major/rnn_cell_demo.py
index cf1e0a0..80b281b 100644
--- a/example/rnn-time-major/rnn_cell_demo.py
+++ b/example/rnn-time-major/rnn_cell_demo.py
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-"""A simple demo of new RNN cell with PTB language model."""
+"""A simple demo of new RNN cell with Sherlock Holmes language model."""
 
 ################################################################################
 # Speed test (time major is 1.5~2 times faster than batch major).
@@ -93,16 +93,16 @@ if __name__ == '__main__':
     gpu_count = 1
     contexts = [mx.context.gpu(i) for i in range(gpu_count)]
 
-    vocab = default_build_vocab(os.path.join(data_dir, 'ptb.train.txt'))
+    vocab = default_build_vocab(os.path.join(data_dir, 'sherlockholmes.train.txt'))
 
     init_h = [mx.io.DataDesc('LSTM_state', (num_lstm_layer, batch_size, num_hidden), layout='TNC')]
     init_c = [mx.io.DataDesc('LSTM_state_cell', (num_lstm_layer, batch_size, num_hidden), layout='TNC')]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter(os.path.join(data_dir, 'ptb.train.txt'),
+    data_train = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.train.txt'),
                                     vocab, buckets, batch_size, init_states,
                                     time_major=True)
-    data_val = BucketSentenceIter(os.path.join(data_dir, 'ptb.valid.txt'),
+    data_val = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.valid.txt'),
                                   vocab, buckets, batch_size, init_states,
                                   time_major=True)
 
diff --git a/example/rnn/README.md b/example/rnn/README.md
index f0d80c3..a0846fa 100644
--- a/example/rnn/README.md
+++ b/example/rnn/README.md
@@ -8,7 +8,7 @@ Here is a short overview of what is in this directory.
 
 Directory | What's in it?
 --- | ---
-`word_lm/` | Language model trained on the PTB dataset achieving state of the art performance
+`word_lm/` | Language model trained on the Sherlock Holmes dataset achieving state of the art performance
 `bucketing/` | Language model with bucketing API with python
 `bucket_R/` | Language model with bucketing API with R
 `old/` | Language model trained with low level symbol interface (deprecated)
diff --git a/example/rnn/bucketing/README.md b/example/rnn/bucketing/README.md
index b46642b..9bbeefd 100644
--- a/example/rnn/bucketing/README.md
+++ b/example/rnn/bucketing/README.md
@@ -3,13 +3,13 @@ RNN Example
 This folder contains RNN examples using high level mxnet.rnn interface.
 
 ## Data
-1) Review the license for the PenTreeBank dataset and ensure that you agree to it. Then uncomment the lines in the 'get_ptb_data.sh' script that download the dataset.
+1) Review the license for the Sherlock Holmes dataset and ensure that you agree to it. Then uncomment the lines in the 'get_sherlockholmes_data.sh' script that download the dataset.
 
-2) Run `get_ptb_data.sh` to download PenTreeBank data.
+2) Run `get_sherlockholmes_data.sh` to download Sherlock Holmes data.
 
 ## Python
 
-- Generate the PennTreeBank language model by using LSTM:
+- Generate the Sherlock Holmes language model by using LSTM:
 
   For Python2 (CPU support): can take 2+ hours on AWS-EC2-p2.16xlarge
 
@@ -23,11 +23,11 @@ This folder contains RNN examples using high level mxnet.rnn interface.
 
   For Python2 (GPU support only): can take 50+ minutes on AWS-EC2-p2.16xlarge
 
-      $ python  --gpus 0,1,2,3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) 
+      $ python [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) --gpus 0,1,2,3
 
   For Python3 (GPU support only): can take 50+ minutes on AWS-EC2-p2.16xlarge
 
-      $ python3 --gpus 0,1,2,3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) 
+      $ python3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) --gpus 0,1,2,3
 
 
 ### Performance Note:
diff --git a/example/rnn/bucketing/cudnn_rnn_bucketing.py b/example/rnn/bucketing/cudnn_rnn_bucketing.py
index 5825290..66d5a55 100644
--- a/example/rnn/bucketing/cudnn_rnn_bucketing.py
+++ b/example/rnn/bucketing/cudnn_rnn_bucketing.py
@@ -19,7 +19,7 @@ import numpy as np
 import mxnet as mx
 import argparse
 
-parser = argparse.ArgumentParser(description="Train RNN on Penn Tree Bank",
+parser = argparse.ArgumentParser(description="Train RNN on Sherlock Holmes",
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('--test', default=False, action='store_true',
                     help='whether to do testing instead of training')
@@ -81,9 +81,9 @@ def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
     return sentences, vocab
 
 def get_data(layout):
-    train_sent, vocab = tokenize_text("./data/ptb.train.txt", start_label=start_label,
+    train_sent, vocab = tokenize_text("./data/sherlockholmes.train.txt", start_label=start_label,
                                       invalid_label=invalid_label)
-    val_sent, _ = tokenize_text("./data/ptb.test.txt", vocab=vocab, start_label=start_label,
+    val_sent, _ = tokenize_text("./data/sherlockholmes.test.txt", vocab=vocab, start_label=start_label,
                                 invalid_label=invalid_label)
 
     data_train  = mx.rnn.BucketSentenceIter(train_sent, args.batch_size, buckets=buckets,
diff --git a/example/rnn/bucketing/get_ptb_data.sh b/example/rnn/bucketing/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn/bucketing/get_ptb_data.sh
rename to example/rnn/bucketing/get_sherlockholmes_data.sh
index 2dc4034..43c8669 100755
--- a/example/rnn/bucketing/get_ptb_data.sh
+++ b/example/rnn/bucketing/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/bucketing/lstm_bucketing.py b/example/rnn/bucketing/lstm_bucketing.py
index 0e7f064..7f15010 100644
--- a/example/rnn/bucketing/lstm_bucketing.py
+++ b/example/rnn/bucketing/lstm_bucketing.py
@@ -20,7 +20,7 @@ import mxnet as mx
 import argparse
 import os
 
-parser = argparse.ArgumentParser(description="Train RNN on Penn Tree Bank",
+parser = argparse.ArgumentParser(description="Train RNN on Sherlock Holmes",
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('--num-layers', type=int, default=2,
                     help='number of stacked RNN layers')
@@ -50,7 +50,7 @@ parser.add_argument('--disp-batches', type=int, default=50,
 
 def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
     if not os.path.isfile(fname):
-        raise IOError("Please use get_ptb_data.sh to download requied file (data/ptb.train.txt)")
+        raise IOError("Please use get_sherlockholmes_data.sh to download requied file (data/sherlockholmes.train.txt)")
     lines = open(fname).readlines()
     lines = [filter(None, i.split(' ')) for i in lines]
     sentences, vocab = mx.rnn.encode_sentences(lines, vocab=vocab, invalid_label=invalid_label,
@@ -71,9 +71,9 @@ if __name__ == '__main__':
     start_label = 1
     invalid_label = 0
 
-    train_sent, vocab = tokenize_text("./data/ptb.train.txt", start_label=start_label,
+    train_sent, vocab = tokenize_text("./data/sherlockholmes.train.txt", start_label=start_label,
                                       invalid_label=invalid_label)
-    val_sent, _ = tokenize_text("./data/ptb.test.txt", vocab=vocab, start_label=start_label,
+    val_sent, _ = tokenize_text("./data/sherlockholmes.test.txt", vocab=vocab, start_label=start_label,
                                 invalid_label=invalid_label)
 
     data_train  = mx.rnn.BucketSentenceIter(train_sent, args.batch_size, buckets=buckets,
@@ -92,7 +92,7 @@ if __name__ == '__main__':
                                  output_dim=args.num_embed, name='embed')
 
         stack.reset()
-        outputs, states = stack.unroll(seq_len, inputs=embed, merge_outputs=True)
+        outputs = stack.unroll(seq_len, inputs=embed, merge_outputs=True)[0]
 
         pred = mx.sym.Reshape(outputs, shape=(-1, args.num_hidden))
         pred = mx.sym.FullyConnected(data=pred, num_hidden=len(vocab), name='pred')
diff --git a/example/rnn/old/README.md b/example/rnn/old/README.md
index c03b36a..5d73523 100644
--- a/example/rnn/old/README.md
+++ b/example/rnn/old/README.md
@@ -3,14 +3,14 @@ RNN Example
 This folder contains RNN examples using low level symbol interface.
 
 ## Data
-Run `get_ptb_data.sh` to download PenTreeBank data.
+Run `get_sherlockholmes_data.sh` to download Sherlock Holmes data.
 
 ## Python
 
 - [lstm.py](lstm.py) Functions for building a LSTM Network
 - [gru.py](gru.py) Functions for building a GRU Network
-- [lstm_bucketing.py](lstm_bucketing.py) PennTreeBank language model by using LSTM
-- [gru_bucketing.py](gru_bucketing.py) PennTreeBank language model by using GRU
+- [lstm_bucketing.py](lstm_bucketing.py) Sherlock Holmes language model by using LSTM
+- [gru_bucketing.py](gru_bucketing.py) Sherlock Holmes language model by using GRU
 - [char-rnn.ipynb](char-rnn.ipynb) Notebook to demo how to train a character LSTM by using ```lstm.py```
 
 
diff --git a/example/rnn/old/bucket_io.py b/example/rnn/old/bucket_io.py
index 21f96ef..96c63fb 100644
--- a/example/rnn/old/bucket_io.py
+++ b/example/rnn/old/bucket_io.py
@@ -202,7 +202,7 @@ class BucketSentenceIter(mx.io.DataIter):
         # truncate each bucket into multiple of batch-size
         bucket_n_batches = []
         for i in range(len(self.data)):
-            bucket_n_batches.append(len(self.data[i]) / self.batch_size)
+            bucket_n_batches.append(np.floor((self.data[i]) / self.batch_size))
             self.data[i] = self.data[i][:int(bucket_n_batches[i]*self.batch_size)]
 
         bucket_plan = np.hstack([np.zeros(n, int)+i for i, n in enumerate(bucket_n_batches)])
diff --git a/example/rnn/old/get_ptb_data.sh b/example/rnn/old/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn/old/get_ptb_data.sh
rename to example/rnn/old/get_sherlockholmes_data.sh
index 2dc4034..43c8669 100755
--- a/example/rnn/old/get_ptb_data.sh
+++ b/example/rnn/old/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/old/gru_bucketing.py b/example/rnn/old/gru_bucketing.py
index 226018c..b9f651a 100644
--- a/example/rnn/old/gru_bucketing.py
+++ b/example/rnn/old/gru_bucketing.py
@@ -23,7 +23,7 @@ import numpy as np
 import mxnet as mx
 
 from gru import gru_unroll
-from bucket_io import BucketSentenceIter, default_build_vocab
+from bucket_io import BucketSentenceIter, default_build_vocab, DummyIter
 
 def Perplexity(label, pred):
     label = label.T.reshape((-1,))
@@ -51,7 +51,7 @@ if __name__ == '__main__':
     #contexts = [mx.context.gpu(i) for i in range(1)]
     contexts = mx.context.cpu()
 
-    vocab = default_build_vocab("./data/ptb.train.txt")
+    vocab = default_build_vocab("./data/sherlockholmes.train.txt")
 
     def sym_gen(seq_len):
         return gru_unroll(num_lstm_layer, seq_len, len(vocab),
@@ -60,9 +60,9 @@ if __name__ == '__main__':
 
     init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
 
-    data_train = BucketSentenceIter("./data/ptb.train.txt", vocab,
+    data_train = BucketSentenceIter("./data/sherlockholmes.train.txt", vocab,
                                     buckets, batch_size, init_h)
-    data_val = BucketSentenceIter("./data/ptb.valid.txt", vocab,
+    data_val = BucketSentenceIter("./data/sherlockholmes.valid.txt", vocab,
                                   buckets, batch_size, init_h)
 
     if dummy_data:
diff --git a/example/rnn/old/lstm_bucketing.py b/example/rnn/old/lstm_bucketing.py
index 3e34947..0fe4116 100644
--- a/example/rnn/old/lstm_bucketing.py
+++ b/example/rnn/old/lstm_bucketing.py
@@ -23,7 +23,7 @@ import numpy as np
 import mxnet as mx
 
 from lstm import lstm_unroll
-from bucket_io import BucketSentenceIter, default_build_vocab
+from bucket_io import BucketSentenceIter, default_build_vocab, DummyIter
 
 def Perplexity(label, pred):
     label = label.T.reshape((-1,))
@@ -51,7 +51,7 @@ if __name__ == '__main__':
 
     contexts = [mx.context.gpu(i) for i in range(N)]
 
-    vocab = default_build_vocab("./data/ptb.train.txt")
+    vocab = default_build_vocab("./data/sherlockholmes.train.txt")
 
     def sym_gen(seq_len):
         return lstm_unroll(num_lstm_layer, seq_len, len(vocab),
@@ -62,9 +62,9 @@ if __name__ == '__main__':
     init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter("./data/ptb.train.txt", vocab,
+    data_train = BucketSentenceIter("./data/sherlockholmes.train.txt", vocab,
                                     buckets, batch_size, init_states)
-    data_val = BucketSentenceIter("./data/ptb.valid.txt", vocab,
+    data_val = BucketSentenceIter("./data/sherlockholmes.valid.txt", vocab,
                                   buckets, batch_size, init_states)
 
     if dummy_data:
diff --git a/example/rnn/old/lstm_ptb.R b/example/rnn/old/lstm_sherlockholmes.R
similarity index 92%
rename from example/rnn/old/lstm_ptb.R
rename to example/rnn/old/lstm_sherlockholmes.R
index e846705..11d2039 100644
--- a/example/rnn/old/lstm_ptb.R
+++ b/example/rnn/old/lstm_sherlockholmes.R
@@ -15,9 +15,9 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# PennTreeBank Language Model using lstm, you can replace mx.lstm by mx.gru/ mx.rnn to use gru/rnn model
+# Sherlock Holmes Language Model using lstm, you can replace mx.lstm by mx.gru/ mx.rnn to use gru/rnn model
 # The data file can be found at:
-# https://github.com/dmlc/web-data/tree/master/mxnet/ptb
+# https://github.com/dmlc/web-data/tree/master/mxnet/sherlockholmes
 require(hash)
 require(mxnet)
 require(stringr
@@ -88,10 +88,10 @@ wd=0.00001
 update.period = 1
 
 
-train <- load.data("./data/ptb.train.txt")
+train <- load.data("./data/sherlockholmes.train.txt")
 X.train <- train$X
 dic <- train$dic
-val <- load.data("./data/ptb.valid.txt", dic)
+val <- load.data("./data/sherlockholmes.valid.txt", dic)
 X.val <- val$X
 dic <- val$dic
 X.train.data <- replicate.data(X.train, seq.len)
diff --git a/example/rnn/old/rnn_cell_demo.py b/example/rnn/old/rnn_cell_demo.py
index 3223e93..c5772fa 100644
--- a/example/rnn/old/rnn_cell_demo.py
+++ b/example/rnn/old/rnn_cell_demo.py
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-"""A simple demo of new RNN cell with PTB language model."""
+"""A simple demo of new RNN cell with sherlockholmes language model."""
 
 import os
 
@@ -50,15 +50,15 @@ if __name__ == '__main__':
     momentum = 0.0
 
     contexts = [mx.context.gpu(i) for i in range(4)]
-    vocab = default_build_vocab(os.path.join(data_dir, 'ptb.train.txt'))
+    vocab = default_build_vocab(os.path.join(data_dir, 'sherlockholmes.train.txt'))
 
     init_h = [('LSTM_init_h', (batch_size, num_lstm_layer, num_hidden))]
     init_c = [('LSTM_init_c', (batch_size, num_lstm_layer, num_hidden))]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter(os.path.join(data_dir, 'ptb.train.txt'),
+    data_train = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.train.txt'),
                                     vocab, buckets, batch_size, init_states)
-    data_val = BucketSentenceIter(os.path.join(data_dir, 'ptb.valid.txt'),
+    data_val = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.valid.txt'),
                                   vocab, buckets, batch_size, init_states)
 
     def sym_gen(seq_len):
diff --git a/example/rnn/word_lm/README.md b/example/rnn/word_lm/README.md
index c498032..beed6fc 100644
--- a/example/rnn/word_lm/README.md
+++ b/example/rnn/word_lm/README.md
@@ -1,6 +1,6 @@
 Word Level Language Modeling
 ===========
-This example trains a multi-layer LSTM on Penn Treebank (PTB) language modeling benchmark.
+This example trains a multi-layer LSTM on Sherlock Holmes language modeling benchmark.
 
 The following techniques have been adopted for SOTA results:
 - [LSTM for LM](https://arxiv.org/pdf/1409.2329.pdf)
@@ -10,7 +10,7 @@ The following techniques have been adopted for SOTA results:
 The example requires MXNet built with CUDA.
 
 ## Data
-The PTB data is the processed version from [(Mikolov et al, 2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf):
+The Sherlock Holmes data is a copyright free copy of Sherlock Holmes from[(Project Gutenberg)](http://www.gutenberg.org/cache/epub/1661/pg1661.txt):
 
 ## Usage
 Example runs and the results:
@@ -25,7 +25,7 @@ usage: train.py [-h] [--data DATA] [--emsize EMSIZE] [--nhid NHID]
                 [--batch_size BATCH_SIZE] [--dropout DROPOUT] [--tied]
                 [--bptt BPTT] [--log-interval LOG_INTERVAL] [--seed SEED]
 
-PennTreeBank LSTM Language Model
+Sherlock Holmes LSTM Language Model
 
 optional arguments:
   -h, --help            show this help message and exit
diff --git a/example/rnn/word_lm/get_ptb_data.sh b/example/rnn/word_lm/get_ptb_data.sh
deleted file mode 100755
index 2dc4034..0000000
--- a/example/rnn/word_lm/get_ptb_data.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/usr/bin/env bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-echo
-echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
-read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
-echo
-
-if [ $REPLY != "Y" ]
-then
-    echo "License was not reviewed, aborting script."
-    exit 1
-fi
-
-RNN_DIR=$(cd `dirname $0`; pwd)
-DATA_DIR="${RNN_DIR}/data/"
-
-if [[ ! -d "${DATA_DIR}" ]]; then
-  echo "${DATA_DIR} doesn't exist, will create one";
-  mkdir -p ${DATA_DIR}
-fi
-
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/model-parallel/lstm/get_ptb_data.sh b/example/rnn/word_lm/get_sherlockholmes_data.sh
similarity index 85%
rename from example/model-parallel/lstm/get_ptb_data.sh
rename to example/rnn/word_lm/get_sherlockholmes_data.sh
index 2dc4034..43c8669 100755
--- a/example/model-parallel/lstm/get_ptb_data.sh
+++ b/example/rnn/word_lm/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/word_lm/train.py b/example/rnn/word_lm/train.py
index 53b6bd3..aa64135 100644
--- a/example/rnn/word_lm/train.py
+++ b/example/rnn/word_lm/train.py
@@ -24,8 +24,8 @@ from model import *
 from module import *
 from mxnet.model import BatchEndParam
 
-parser = argparse.ArgumentParser(description='PennTreeBank LSTM Language Model')
-parser.add_argument('--data', type=str, default='./data/ptb.',
+parser = argparse.ArgumentParser(description='Sherlock Holmes LSTM Language Model')
+parser.add_argument('--data', type=str, default='./data/sherlockholmes.',
                     help='location of the data corpus')
 parser.add_argument('--emsize', type=int, default=650,
                     help='size of word embeddings')
diff --git a/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl b/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
index 8976e64..326e57c 100755
--- a/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
+++ b/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
@@ -98,11 +98,11 @@ my $invalid_label = 0;
 func get_data($layout)
 {
     my ($train_sentences, $vocabulary) = tokenize_text(
-        './data/ptb.train.txt', start_label => $start_label,
+        './data/sherlockholmes.train.txt', start_label => $start_label,
         invalid_label => $invalid_label
     );
     my ($validation_sentences) = tokenize_text(
-        './data/ptb.test.txt', vocab => $vocabulary,
+        './data/sherlockholmes.test.txt', vocab => $vocabulary,
         start_label => $start_label, invalid_label => $invalid_label
     );
     my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/perl-package/AI-MXNet/examples/get_ptb_data.sh b/perl-package/AI-MXNet/examples/get_ptb_data.sh
deleted file mode 100755
index 2dc4034..0000000
--- a/perl-package/AI-MXNet/examples/get_ptb_data.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/usr/bin/env bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-echo
-echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
-read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
-echo
-
-if [ $REPLY != "Y" ]
-then
-    echo "License was not reviewed, aborting script."
-    exit 1
-fi
-
-RNN_DIR=$(cd `dirname $0`; pwd)
-DATA_DIR="${RNN_DIR}/data/"
-
-if [[ ! -d "${DATA_DIR}" ]]; then
-  echo "${DATA_DIR} doesn't exist, will create one";
-  mkdir -p ${DATA_DIR}
-fi
-
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn-time-major/get_ptb_data.sh b/perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn-time-major/get_ptb_data.sh
rename to perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh
index 2dc4034..43c8669 100755
--- a/example/rnn-time-major/get_ptb_data.sh
+++ b/perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/perl-package/AI-MXNet/examples/lstm_bucketing.pl b/perl-package/AI-MXNet/examples/lstm_bucketing.pl
index e6699d7..3618a62 100755
--- a/perl-package/AI-MXNet/examples/lstm_bucketing.pl
+++ b/perl-package/AI-MXNet/examples/lstm_bucketing.pl
@@ -44,7 +44,7 @@ GetOptions(
 
 =head1 NAME
 
-    lstm_bucketing.pl - Example of training LSTM RNN on Penn Tree Bank data using high level RNN interface
+    lstm_bucketing.pl - Example of training LSTM RNN on Sherlock Holmes data using high level RNN interface
 
 =head1 SYNOPSIS
 
@@ -84,11 +84,11 @@ my $start_label   = 1;
 my $invalid_label = 0;
 
 my ($train_sentences, $vocabulary) = tokenize_text(
-    './data/ptb.train.txt', start_label => $start_label,
+    './data/sherlockholmes.train.txt', start_label => $start_label,
     invalid_label => $invalid_label
 );
 my ($validation_sentences) = tokenize_text(
-    './data/ptb.test.txt', vocab => $vocabulary,
+    './data/sherlockholmes.test.txt', vocab => $vocabulary,
     start_label => $start_label, invalid_label => $invalid_label
 );
 my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm b/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
index aa49567..423b0ae 100644
--- a/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
+++ b/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
@@ -33,11 +33,11 @@ AI::MXNet::Module::Bucketing
     my $invalid_label = 0;
 
     my ($train_sentences, $vocabulary) = tokenize_text(
-        './data/ptb.train.txt', start_label => $start_label,
+        './data/sherlockholmes.train.txt', start_label => $start_label,
         invalid_label => $invalid_label
     );
     my ($validation_sentences) = tokenize_text(
-        './data/ptb.test.txt', vocab => $vocabulary,
+        './data/sherlockholmes.test.txt', vocab => $vocabulary,
         start_label => $start_label, invalid_label => $invalid_label
     );
     my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
index f3fe764..44ee6e7 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
@@ -35,9 +35,9 @@ import org.apache.mxnet.module.FitParams
  */
 class LstmBucketing {
   @Option(name = "--data-train", usage = "training set")
-  private val dataTrain: String = "example/rnn/ptb.train.txt"
+  private val dataTrain: String = "example/rnn/sherlockholmes.train.txt"
   @Option(name = "--data-val", usage = "validation set")
-  private val dataVal: String = "example/rnn/ptb.valid.txt"
+  private val dataVal: String = "example/rnn/sherlockholmes.valid.txt"
   @Option(name = "--num-epoch", usage = "the number of training epoch")
   private val numEpoch: Int = 5
   @Option(name = "--gpus", usage = "the gpus will be used, e.g. '0,1,2,3'")