You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/06/28 21:45:18 UTC

[GitHub] szha closed pull request #11435: Replace PTB dataset to Sherlock Holmes

szha closed pull request #11435: Replace PTB dataset to Sherlock Holmes 
URL: https://github.com/apache/incubator-mxnet/pull/11435
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/faq/bucketing.md b/docs/faq/bucketing.md
index 6bcf80fea27..dbfdedde2ac 100644
--- a/docs/faq/bucketing.md
+++ b/docs/faq/bucketing.md
@@ -31,9 +31,9 @@ by maintaining the connection of the states and gradients through time.
 However, this implementation approach results in slow processing.
 This approach works with variable length sequences. For more complicated models (e.g., translation that uses a sequence-to-sequence model), explicitly unrolling is the easiest way. In this example, we introduce the MXNet APIs that allows us to implement bucketing.
 
-## Variable-length Sequence Training for PTB
+## Variable-length Sequence Training for Sherlock Holmes
 
-We use the [PennTreeBank language model example](https://github.com/dmlc/mxnet/tree/master/example/rnn) for this example. If you are not familiar with this example, see [this tutorial (in Julia)](http://dmlc.ml/mxnet/2015/11/15/char-lstm-in-julia.html) first.
+We use the [Sherlock Holmes language model example](https://github.com/dmlc/mxnet/tree/master/example/rnn) for this example. If you are not familiar with this example, see [this tutorial (in Julia)](http://dmlc.ml/mxnet/2015/11/15/char-lstm-in-julia.html) first.
 
 In this example, we use a simple architecture
 consisting of a word-embedding layer
diff --git a/docs/model_zoo/index.md b/docs/model_zoo/index.md
index 19811f22552..f7d14790279 100644
--- a/docs/model_zoo/index.md
+++ b/docs/model_zoo/index.md
@@ -56,6 +56,7 @@ For instructions on using these models, see [the python tutorial on using pre-tr
 MXNet supports many types of recurrent neural networks (RNNs), including Long Short-Term Memory ([LSTM](http://www.bioinf.jku.at/publications/older/2604.pdf))
 and Gated Recurrent Units (GRU) networks. Some available datasets include:
 
+* [Sherlock Holmes](http://www.gutenberg.org/cache/epub/1661/pg1661.txt): Text corpus with ~1 million words.The task is predicting downstream words/characters.
 * [Penn Treebank (PTB)](https://catalog.ldc.upenn.edu/LDC95T7): Text corpus with ~1 million words. Vocabulary is limited to 10,000 words. The task is predicting downstream words/characters.
 * [Shakespeare](http://cs.stanford.edu/people/karpathy/char-rnn/): Complete text from Shakespeare's works.
 * [IMDB reviews](https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3): 25,000 movie reviews, labeled as positive or negative
diff --git a/example/model-parallel/lstm/README.md b/example/model-parallel/lstm/README.md
deleted file mode 100644
index 6f31ff83484..00000000000
--- a/example/model-parallel/lstm/README.md
+++ /dev/null
@@ -1,13 +0,0 @@
-Model Parallel LSTM
-===================
-
-This is an example showing how to do model parallel LSTM in MXNet.
-
-We use [the PenTreeBank dataset](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/)
-in this example. Download the dataset with below command:
-
-`bash get_ptb_data.sh`
-
-This will download PenTreeBank dataset under `data` folder. Now, you can run the training as follows:
-
-`python lstm_ptb.py`
diff --git a/example/model-parallel/lstm/lstm.py b/example/model-parallel/lstm/lstm.py
deleted file mode 100644
index 75fa533c786..00000000000
--- a/example/model-parallel/lstm/lstm.py
+++ /dev/null
@@ -1,532 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint:skip-file
-import sys
-sys.path.insert(0, "../../python")
-import mxnet as mx
-import numpy as np
-from collections import namedtuple
-import time
-import math
-LSTMState = namedtuple("LSTMState", ["c", "h"])
-LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias",
-                                     "h2h_weight", "h2h_bias"])
-LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol",
-                                     "init_states", "last_states",
-                                     "seq_data", "seq_labels", "seq_outputs",
-                                     "param_blocks"])
-
-def lstm(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.):
-    """LSTM Cell symbol"""
-    if dropout > 0.:
-        indata = mx.sym.Dropout(data=indata, p=dropout)
-    i2h = mx.sym.FullyConnected(data=indata,
-                                weight=param.i2h_weight,
-                                bias=param.i2h_bias,
-                                num_hidden=num_hidden * 4,
-                                name="t%d_l%d_i2h" % (seqidx, layeridx))
-    h2h = mx.sym.FullyConnected(data=prev_state.h,
-                                weight=param.h2h_weight,
-                                bias=param.h2h_bias,
-                                num_hidden=num_hidden * 4,
-                                name="t%d_l%d_h2h" % (seqidx, layeridx))
-    gates = i2h + h2h
-    slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
-                                      name="t%d_l%d_slice" % (seqidx, layeridx))
-    in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
-    in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
-    forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid")
-    out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid")
-    next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
-    next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh")
-    return LSTMState(c=next_c, h=next_h)
-
-
-def lstm_unroll(num_lstm_layer, seq_len, input_size,
-                num_hidden, num_embed, num_label, dropout=0.,
-                concat_decode=True, use_loss=False):
-    """unrolled lstm network"""
-    # initialize the parameter symbols
-    with mx.AttrScope(ctx_group='embed'):
-        embed_weight=mx.sym.Variable("embed_weight")
-
-    with mx.AttrScope(ctx_group='decode'):
-        cls_weight = mx.sym.Variable("cls_weight")
-        cls_bias = mx.sym.Variable("cls_bias")
-
-    param_cells = []
-    last_states = []
-    for i in range(num_lstm_layer):
-        with mx.AttrScope(ctx_group='layer%d' % i):
-            param_cells.append(LSTMParam(i2h_weight = mx.sym.Variable("l%d_i2h_weight" % i),
-                                         i2h_bias = mx.sym.Variable("l%d_i2h_bias" % i),
-                                         h2h_weight = mx.sym.Variable("l%d_h2h_weight" % i),
-                                         h2h_bias = mx.sym.Variable("l%d_h2h_bias" % i)))
-            state = LSTMState(c=mx.sym.Variable("l%d_init_c" % i),
-                              h=mx.sym.Variable("l%d_init_h" % i))
-        last_states.append(state)
-    assert(len(last_states) == num_lstm_layer)
-
-    last_hidden = []
-    for seqidx in range(seq_len):
-        # embedding layer
-        with mx.AttrScope(ctx_group='embed'):
-            data = mx.sym.Variable("t%d_data" % seqidx)
-            hidden = mx.sym.Embedding(data=data, weight=embed_weight,
-                                      input_dim=input_size,
-                                      output_dim=num_embed,
-                                      name="t%d_embed" % seqidx)
-        # stack LSTM
-        for i in range(num_lstm_layer):
-            if i==0:
-                dp=0.
-            else:
-                dp = dropout
-            with mx.AttrScope(ctx_group='layer%d' % i):
-                next_state = lstm(num_hidden, indata=hidden,
-                                  prev_state=last_states[i],
-                                  param=param_cells[i],
-                                  seqidx=seqidx, layeridx=i, dropout=dp)
-                hidden = next_state.h
-                last_states[i] = next_state
-
-        # decoder
-        if dropout > 0.:
-            hidden = mx.sym.Dropout(data=hidden, p=dropout)
-        last_hidden.append(hidden)
-
-    out_prob = []
-    if not concat_decode:
-        for seqidx in range(seq_len):
-            with mx.AttrScope(ctx_group='decode'):
-                fc = mx.sym.FullyConnected(data=last_hidden[seqidx],
-                                           weight=cls_weight,
-                                           bias=cls_bias,
-                                           num_hidden=num_label,
-                                           name="t%d_cls" % seqidx)
-                label = mx.sym.Variable("t%d_label" % seqidx)
-                if use_loss:
-                    # Currently softmax_cross_entropy fails https://github.com/apache/incubator-mxnet/issues/6874
-                    # So, workaround for now to fix this example
-                    out = mx.symbol.softmax(data=fc)
-                    label = mx.sym.Reshape(label, shape=(-1,1))
-                    ce = - mx.sym.broadcast_add(mx.sym.broadcast_mul(label, mx.sym.log(out)),
-                                              mx.sym.broadcast_mul((1 - label), mx.sym.log(1 - out)))
-                    sm = mx.sym.MakeLoss(ce,  name="t%d_sm" % seqidx)
-                else:
-                    sm = mx.sym.SoftmaxOutput(data=fc, label=label, name="t%d_sm" % seqidx)
-                out_prob.append(sm)
-    else:
-        with mx.AttrScope(ctx_group='decode'):
-            concat = mx.sym.Concat(*last_hidden, dim = 0)
-            fc = mx.sym.FullyConnected(data=concat,
-                                       weight=cls_weight,
-                                       bias=cls_bias,
-                                       num_hidden=num_label)
-            label = mx.sym.Variable("label")
-            if use_loss:
-                # Currently softmax_cross_entropy fails https://github.com/apache/incubator-mxnet/issues/6874
-                # So, workaround for now to fix this example
-                out = mx.symbol.softmax(data=fc)
-                label = mx.sym.Reshape(label, shape=(-1, 1))
-                ce = mx.sym.broadcast_add(mx.sym.broadcast_mul(label, mx.sym.log(out)),
-                                              mx.sym.broadcast_mul((1 - label), mx.sym.log(1 - out)))
-                sm = mx.sym.MakeLoss(ce,  name="sm")
-            else:
-                sm = mx.sym.SoftmaxOutput(data=fc, label=label, name="sm")
-            out_prob = [sm]
-
-    for i in range(num_lstm_layer):
-        state = last_states[i]
-        state = LSTMState(c=mx.sym.BlockGrad(state.c, name="l%d_last_c" % i),
-                          h=mx.sym.BlockGrad(state.h, name="l%d_last_h" % i))
-        last_states[i] = state
-
-    unpack_c = [state.c for state in last_states]
-    unpack_h = [state.h for state in last_states]
-    list_all = out_prob + unpack_c + unpack_h
-    return mx.sym.Group(list_all)
-
-
-def is_param_name(name):
-    return name.endswith("weight") or name.endswith("bias") or\
-        name.endswith("gamma") or name.endswith("beta")
-
-
-def setup_rnn_model(default_ctx,
-                    num_lstm_layer, seq_len,
-                    num_hidden, num_embed, num_label,
-                    batch_size, input_size,
-                    initializer, dropout=0.,
-                    group2ctx=None, concat_decode=True,
-                    use_loss=False, buckets=None):
-    """set up rnn model with lstm cells"""
-    max_len = max(buckets)
-    max_rnn_exec = None
-    models = {}
-    buckets.reverse()
-    for bucket_key in buckets:
-        # bind max_len first
-        rnn_sym = lstm_unroll(num_lstm_layer=num_lstm_layer,
-                          num_hidden=num_hidden,
-                          seq_len=seq_len,
-                          input_size=input_size,
-                          num_embed=num_embed,
-                          num_label=num_label,
-                          dropout=dropout,
-                          concat_decode=concat_decode,
-                          use_loss=use_loss)
-        arg_names = rnn_sym.list_arguments()
-        internals = rnn_sym.get_internals()
-
-        input_shapes = {}
-        for name in arg_names:
-            if name.endswith("init_c") or name.endswith("init_h"):
-                input_shapes[name] = (batch_size, num_hidden)
-            elif name.endswith("data"):
-                input_shapes[name] = (batch_size, )
-            elif name == "label":
-                input_shapes[name] = (batch_size * seq_len, )
-            elif name.endswith("label"):
-                input_shapes[name] = (batch_size, )
-            else:
-                pass
-
-        arg_shape, out_shape, aux_shape = rnn_sym.infer_shape(**input_shapes)
-        # bind arrays
-        arg_arrays = []
-        args_grad = {}
-        for shape, name in zip(arg_shape, arg_names):
-            group = internals[name].attr("__ctx_group__")
-            ctx = group2ctx[group] if group is not None else default_ctx
-            arg_arrays.append(mx.nd.zeros(shape, ctx))
-            if is_param_name(name):
-                args_grad[name] = mx.nd.zeros(shape, ctx)
-            if not name.startswith("t"):
-                print("%s group=%s, ctx=%s" % (name, group, str(ctx)))
-
-        # bind with shared executor
-        rnn_exec = None
-        if max_len == bucket_key:
-              rnn_exec = rnn_sym.bind(default_ctx, args=arg_arrays,
-                                args_grad=args_grad,
-                                grad_req="add", group2ctx=group2ctx)
-              max_rnn_exec = rnn_exec
-        else:
-              assert max_rnn_exec is not None
-              rnn_exec = rnn_sym.bind(default_ctx, args=arg_arrays,
-                            args_grad=args_grad,
-                            grad_req="add", group2ctx=group2ctx,
-                            shared_exec = max_rnn_exec)
-
-        param_blocks = []
-        arg_dict = dict(zip(arg_names, rnn_exec.arg_arrays))
-        for i, name in enumerate(arg_names):
-            if is_param_name(name):
-                initializer(name, arg_dict[name])
-                param_blocks.append((i, arg_dict[name], args_grad[name], name))
-            else:
-                assert name not in args_grad
-
-        out_dict = dict(zip(rnn_sym.list_outputs(), rnn_exec.outputs))
-
-        init_states = [LSTMState(c=arg_dict["l%d_init_c" % i],
-                             h=arg_dict["l%d_init_h" % i]) for i in range(num_lstm_layer)]
-
-        seq_data = [rnn_exec.arg_dict["t%d_data" % i] for i in range(seq_len)]
-        # we don't need to store the last state
-        last_states = None
-
-        if concat_decode:
-            seq_outputs = [out_dict["sm_output"]]
-            seq_labels = [rnn_exec.arg_dict["label"]]
-        else:
-            seq_outputs = [out_dict["t%d_sm_output" % i] for i in range(seq_len)]
-            seq_labels = [rnn_exec.arg_dict["t%d_label" % i] for i in range(seq_len)]
-
-        model = LSTMModel(rnn_exec=rnn_exec, symbol=rnn_sym,
-                     init_states=init_states, last_states=last_states,
-                     seq_data=seq_data, seq_labels=seq_labels, seq_outputs=seq_outputs,
-                     param_blocks=param_blocks)
-        models[bucket_key] = model
-    buckets.reverse()
-    return models
-
-
-def set_rnn_inputs(m, X, begin):
-    seq_len = len(m.seq_data)
-    batch_size = m.seq_data[0].shape[0]
-    for seqidx in range(seq_len):
-        idx = (begin + seqidx) % X.shape[0]
-        next_idx = (begin + seqidx + 1) % X.shape[0]
-        x = X[idx, :]
-        y = X[next_idx, :]
-        mx.nd.array(x).copyto(m.seq_data[seqidx])
-        if len(m.seq_labels) == 1:
-            m.seq_labels[0][seqidx*batch_size : seqidx*batch_size+batch_size] = y
-        else:
-            m.seq_labels[seqidx][:] = y
-
-def set_rnn_inputs_from_batch(m, batch, batch_seq_length, batch_size):
-  X = batch.data
-  for seqidx in range(batch_seq_length):
-    idx = seqidx
-    next_idx = (seqidx + 1) % batch_seq_length
-    x = X[idx, :]
-    y = X[next_idx, :]
-    mx.nd.array(x).copyto(m.seq_data[seqidx])
-    if len(m.seq_labels) == 1:
-      m.seq_labels[0][seqidx*batch_size : seqidx*batch_size+batch_size] = y
-    else:
-      m.seq_labels[seqidx][:] = y
-
-def calc_nll_concat(seq_label_probs, batch_size):
-  return -np.sum(np.log(seq_label_probs.asnumpy())) / batch_size
-
-
-def calc_nll(seq_label_probs, batch_size, seq_len):
-  eps = 1e-10
-  nll = 0.
-  for seqidx in range(seq_len):
-    py = seq_label_probs[seqidx].asnumpy()
-    nll += -np.sum(np.log(np.maximum(py, eps))) / batch_size
-    return nll
-
-
-def train_lstm(model, X_train_batch, X_val_batch,
-               num_round, update_period, concat_decode, batch_size, use_loss,
-               optimizer='sgd', half_life=2,max_grad_norm = 5.0, **kwargs):
-    opt = mx.optimizer.create(optimizer,
-                              **kwargs)
-
-    updater = mx.optimizer.get_updater(opt)
-    epoch_counter = 0
-    #log_period = max(1000 / seq_len, 1)
-    log_period = 28
-    last_perp = 10000000.0
-
-    for iteration in range(num_round):
-        nbatch = 0
-        train_nll = 0
-        tic = time.time()
-        for data_batch in X_train_batch:
-            batch_seq_length = data_batch.bucket_key
-            m = model[batch_seq_length]
-            # reset init state
-            for state in m.init_states:
-              state.c[:] = 0.0
-              state.h[:] = 0.0
-
-            head_grad = []
-            if use_loss:
-              ctx = m.seq_outputs[0].context
-              head_grad = [mx.nd.ones((1,), ctx) for x in m.seq_outputs]
-
-            set_rnn_inputs_from_batch(m, data_batch, batch_seq_length, batch_size)
-
-            m.rnn_exec.forward(is_train=True)
-            # probability of each label class, used to evaluate nll
-            # Change back to individual ops to see if fine grained scheduling helps.
-            if not use_loss:
-                if concat_decode:
-                    seq_label_probs = mx.nd.choose_element_0index(m.seq_outputs[0], m.seq_labels[0])
-                else:
-                    seq_label_probs = [mx.nd.choose_element_0index(out, label).copyto(mx.cpu())
-                                       for out, label in zip(m.seq_outputs, m.seq_labels)]
-                m.rnn_exec.backward()
-            else:
-                seq_loss = [x.copyto(mx.cpu()) for x in m.seq_outputs]
-                m.rnn_exec.backward(head_grad)
-
-            # update epoch counter
-            epoch_counter += 1
-            if epoch_counter % update_period == 0:
-                # update parameters
-                norm = 0.
-                for idx, weight, grad, name in m.param_blocks:
-                    grad /= batch_size
-                    l2_norm = mx.nd.norm(grad).asscalar()
-                    norm += l2_norm*l2_norm
-                norm = math.sqrt(norm)
-                for idx, weight, grad, name in m.param_blocks:
-                    if norm > max_grad_norm:
-                        grad *= (max_grad_norm / norm)
-                    updater(idx, grad, weight)
-                    # reset gradient to zero
-                    grad[:] = 0.0
-            if not use_loss:
-                if concat_decode:
-                    train_nll += calc_nll_concat(seq_label_probs, batch_size)
-                else:
-                    train_nll += calc_nll(seq_label_probs, batch_size, batch_seq_length)
-            else:
-                train_nll += sum([x.sum().asscalar() for x in seq_loss]) / batch_size
-
-            nbatch += batch_size
-            toc = time.time()
-            if epoch_counter % log_period == 0:
-                print("Iter [%d] Train: Time: %.3f sec, NLL=%.3f, Perp=%.3f" % (
-                    epoch_counter, toc - tic, train_nll / nbatch, np.exp(train_nll / nbatch)))
-        # end of training loop
-        toc = time.time()
-        print("Iter [%d] Train: Time: %.3f sec, NLL=%.3f, Perp=%.3f" % (
-            iteration, toc - tic, train_nll / nbatch, np.exp(train_nll / nbatch)))
-
-        val_nll = 0.0
-        nbatch = 0
-        for data_batch in X_val_batch:
-            batch_seq_length = data_batch.bucket_key
-            m = model[batch_seq_length]
-
-            # validation set, reset states
-            for state in m.init_states:
-                state.h[:] = 0.0
-                state.c[:] = 0.0
-
-            set_rnn_inputs_from_batch(m, data_batch, batch_seq_length, batch_size)
-            m.rnn_exec.forward(is_train=False)
-
-            # probability of each label class, used to evaluate nll
-            if not use_loss:
-                if concat_decode:
-                    seq_label_probs = mx.nd.choose_element_0index(m.seq_outputs[0], m.seq_labels[0])
-                else:
-                    seq_label_probs = [mx.nd.choose_element_0index(out, label).copyto(mx.cpu())
-                                       for out, label in zip(m.seq_outputs, m.seq_labels)]
-            else:
-                seq_loss = [x.copyto(mx.cpu()) for x in m.seq_outputs]
-
-            if not use_loss:
-                if concat_decode:
-                    val_nll += calc_nll_concat(seq_label_probs, batch_size)
-                else:
-                    val_nll += calc_nll(seq_label_probs, batch_size, batch_seq_length)
-            else:
-                val_nll += sum([x.sum().asscalar() for x in seq_loss]) / batch_size
-            nbatch += batch_size
-
-        perp = np.exp(val_nll / nbatch)
-        print("Iter [%d] Val: NLL=%.3f, Perp=%.3f" % (
-            iteration, val_nll / nbatch, np.exp(val_nll / nbatch)))
-        if last_perp - 1.0 < perp:
-            opt.lr *= 0.5
-            print("Reset learning rate to %g" % opt.lr)
-        last_perp = perp
-        X_val_batch.reset()
-        X_train_batch.reset()
-
-# is this function being used?
-def setup_rnn_sample_model(ctx,
-                           params,
-                           num_lstm_layer,
-                           num_hidden, num_embed, num_label,
-                           batch_size, input_size):
-    seq_len = 1
-    rnn_sym = lstm_unroll(num_lstm_layer=num_lstm_layer,
-                          input_size=input_size,
-                          num_hidden=num_hidden,
-                          seq_len=seq_len,
-                          num_embed=num_embed,
-                          num_label=num_label)
-    arg_names = rnn_sym.list_arguments()
-    input_shapes = {}
-    for name in arg_names:
-        if name.endswith("init_c") or name.endswith("init_h"):
-            input_shapes[name] = (batch_size, num_hidden)
-        elif name.endswith("data"):
-            input_shapes[name] = (batch_size, )
-        else:
-            pass
-    arg_shape, out_shape, aux_shape = rnn_sym.infer_shape(**input_shapes)
-    arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
-    arg_dict = dict(zip(arg_names, arg_arrays))
-    for name, arr in params.items():
-        arg_dict[name][:] = arr
-    rnn_exec = rnn_sym.bind(ctx=ctx, args=arg_arrays, args_grad=None, grad_req="null")
-    out_dict = dict(zip(rnn_sym.list_outputs(), rnn_exec.outputs))
-    param_blocks = []
-    params_array = list(params.items())
-    for i in range(len(params)):
-        param_blocks.append((i, params_array[i][1], None, params_array[i][0]))
-    init_states = [LSTMState(c=arg_dict["l%d_init_c" % i],
-                             h=arg_dict["l%d_init_h" % i]) for i in range(num_lstm_layer)]
-
-    if concat_decode:
-        seq_labels = [rnn_exec.arg_dict["label"]]
-        seq_outputs = [out_dict["sm_output"]]
-    else:
-        seq_labels = [rnn_exec.arg_dict["t%d_label" % i] for i in range(seq_len)]
-        seq_outputs = [out_dict["t%d_sm" % i] for i in range(seq_len)]
-
-    seq_data = [rnn_exec.arg_dict["t%d_data" % i] for i in range(seq_len)]
-    last_states = [LSTMState(c=out_dict["l%d_last_c_output" % i],
-                             h=out_dict["l%d_last_h_output" % i]) for i in range(num_lstm_layer)]
-
-    return LSTMModel(rnn_exec=rnn_exec, symbol=rnn_sym,
-                     init_states=init_states, last_states=last_states,
-                     seq_data=seq_data, seq_labels=seq_labels, seq_outputs=seq_outputs,
-                     param_blocks=param_blocks)
-
-# Python3 np.random.choice is too strict in eval float probability so we use an alternative
-import random
-import bisect
-import collections
-
-def _cdf(weights):
-    total = sum(weights)
-    result = []
-    cumsum = 0
-    for w in weights:
-        cumsum += w
-        result.append(cumsum / total)
-    return result
-
-def _choice(population, weights):
-    assert len(population) == len(weights)
-    cdf_vals = _cdf(weights)
-    x = random.random()
-    idx = bisect.bisect(cdf_vals, x)
-    return population[idx]
-
-def sample_lstm(model, X_input_batch, seq_len, temperature=1., sample=True):
-    m = model
-    vocab = m.seq_outputs.shape[1]
-    batch_size = m.seq_data[0].shape[0]
-    outputs_ndarray = mx.nd.zeros(m.seq_outputs.shape)
-    outputs_batch = []
-    tmp = [i for i in range(vocab)]
-    for i in range(seq_len):
-        outputs_batch.append(np.zeros(X_input_batch.shape))
-    for i in range(seq_len):
-        set_rnn_inputs(m, X_input_batch, 0)
-        m.rnn_exec.forward(is_train=False)
-        outputs_ndarray[:] = m.seq_outputs
-        for init, last in zip(m.init_states, m.last_states):
-            last.c.copyto(init.c)
-            last.h.copyto(init.h)
-        prob = np.clip(outputs_ndarray.asnumpy(), 1e-6, 1 - 1e-6)
-        if sample:
-            rescale = np.exp(np.log(prob) / temperature)
-            for j in range(batch_size):
-                p = rescale[j, :]
-                p[:] /= p.sum()
-                outputs_batch[i][j] = _choice(tmp, p)
-        else:
-            outputs_batch[i][:] = np.argmax(prob, axis=1)
-        X_input_batch[:] = outputs_batch[i]
-    return outputs_batch
diff --git a/example/model-parallel/lstm/lstm_ptb.py b/example/model-parallel/lstm/lstm_ptb.py
deleted file mode 100644
index 965ba1950b0..00000000000
--- a/example/model-parallel/lstm/lstm_ptb.py
+++ /dev/null
@@ -1,128 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint:skip-file
-import lstm
-import sys
-sys.path.insert(0, "../../python")
-import mxnet as mx
-import numpy as np
-# reuse the bucket_io library
-sys.path.insert(0, "../../rnn/old")
-from bucket_io import BucketSentenceIter, default_build_vocab
-
-"""
-PennTreeBank Language Model
-We would like to thanks Wojciech Zaremba for his Torch LSTM code
-
-The data file can be found at:
-https://github.com/dmlc/web-data/tree/master/mxnet/ptb
-"""
-
-def load_data(path, dic=None):
-    fi = open(path)
-    content = fi.read()
-    content = content.replace('\n', '<eos>')
-    content = content.split(' ')
-    print("Loading %s, size of data = %d" % (path, len(content)))
-    x = np.zeros(len(content))
-    if dic is None:
-        dic = {}
-    idx = 0
-    for i in range(len(content)):
-        word = content[i]
-        if len(word) == 0:
-            continue
-        if not word in dic:
-            dic[word] = idx
-            idx += 1
-        x[i] = dic[word]
-    print("Unique token: %d" % len(dic))
-    return x, dic
-
-def drop_tail(X, seq_len):
-    shape = X.shape
-    nstep = int(shape[0] / seq_len)
-    return X[0:(nstep * seq_len), :]
-
-
-def replicate_data(x, batch_size):
-    nbatch = int(x.shape[0] / batch_size)
-    x_cut = x[:nbatch * batch_size]
-    data = x_cut.reshape((nbatch, batch_size), order='F')
-    return data
-
-batch_size = 20
-seq_len = 35
-num_hidden = 400
-num_embed = 200
-num_lstm_layer = 8
-num_round = 25
-learning_rate= 0.1
-wd=0.
-momentum=0.0
-max_grad_norm = 5.0
-update_period = 1
-
-dic = default_build_vocab("./data/ptb.train.txt")
-vocab = len(dic)
-
-# static buckets
-buckets = [8, 16, 24, 32, 60]
-
-init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
-init_states = init_c + init_h
-
-X_train_batch = BucketSentenceIter("./data/ptb.train.txt", dic,
-                                        buckets, batch_size, init_states, model_parallel=True)
-X_val_batch = BucketSentenceIter("./data/ptb.valid.txt", dic,
-                                      buckets, batch_size, init_states, model_parallel=True)
-
-ngpu = 2
-# A simple two GPU placement plan
-group2ctx = {'embed': mx.gpu(0),
-             'decode': mx.gpu(ngpu - 1)}
-
-for i in range(num_lstm_layer):
-    group2ctx['layer%d' % i] = mx.gpu(i * ngpu // num_lstm_layer)
-
-# whether do group-wise concat
-concat_decode = False
-use_loss=True
-model = lstm.setup_rnn_model(mx.gpu(), group2ctx=group2ctx,
-                             concat_decode=concat_decode,
-                             use_loss=use_loss,
-                             num_lstm_layer=num_lstm_layer,
-                             seq_len=X_train_batch.default_bucket_key,
-                             num_hidden=num_hidden,
-                             num_embed=num_embed,
-                             num_label=vocab,
-                             batch_size=batch_size,
-                             input_size=vocab,
-                             initializer=mx.initializer.Uniform(0.1),dropout=0.5, buckets=buckets)
-
-lstm.train_lstm(model, X_train_batch, X_val_batch,
-                num_round=num_round,
-                concat_decode=concat_decode,
-                use_loss=use_loss,
-                half_life=2,
-                max_grad_norm = max_grad_norm,
-                update_period=update_period,
-                learning_rate=learning_rate,
-                batch_size = batch_size,
-                wd=wd)
diff --git a/example/rnn-time-major/get_ptb_data.sh b/example/rnn-time-major/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn-time-major/get_ptb_data.sh
rename to example/rnn-time-major/get_sherlockholmes_data.sh
index 2dc4034a938..43c8669e003 100755
--- a/example/rnn-time-major/get_ptb_data.sh
+++ b/example/rnn-time-major/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn-time-major/readme.md b/example/rnn-time-major/readme.md
index 7983fe8c6b3..b30b8410b04 100644
--- a/example/rnn-time-major/readme.md
+++ b/example/rnn-time-major/readme.md
@@ -7,18 +7,18 @@ As example of Batch-major RNN is available in MXNet [RNN Bucketing example](http
 	
 ## Running the example
 - Prerequisite: an instance with GPU compute resources is required to run MXNet RNN
-- Make the shell script ```get_ptb_data.sh``` executable:
+- Make the shell script ```get_sherlockholmes_data.sh``` executable:
     ```bash 
-    chmod +x get_ptb_data.sh
+    chmod +x get_sherlockholmes_data.sh
     ```
-- Run ```get_ptb_data.sh``` to download the PTB dataset, and follow the instructions to review the license:
+- Run ```get_sherlockholmes_data.sh``` to download the sherlockholmes dataset, and follow the instructions to review the license:
     ```bash
-    ./get_ptb_data.sh
+    ./get_sherlockholmes_data.sh
     ```
-    The PTB data sets will be downloaded into ./data directory, and available for the example to train on.
+    The sherlockholmes data sets will be downloaded into ./data directory, and available for the example to train on.
 - Run the example:
     ```bash
-    python python rnn_cell_demo.py
+    python rnn_cell_demo.py
     ```
     
     If everything goes well, console will plot training speed and perplexity that you can compare to the batch major RNN.
diff --git a/example/rnn-time-major/rnn_cell_demo.py b/example/rnn-time-major/rnn_cell_demo.py
index cf1e0a0cd1e..80b281b3bdb 100644
--- a/example/rnn-time-major/rnn_cell_demo.py
+++ b/example/rnn-time-major/rnn_cell_demo.py
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-"""A simple demo of new RNN cell with PTB language model."""
+"""A simple demo of new RNN cell with Sherlock Holmes language model."""
 
 ################################################################################
 # Speed test (time major is 1.5~2 times faster than batch major).
@@ -93,16 +93,16 @@ def Perplexity(label, pred):
     gpu_count = 1
     contexts = [mx.context.gpu(i) for i in range(gpu_count)]
 
-    vocab = default_build_vocab(os.path.join(data_dir, 'ptb.train.txt'))
+    vocab = default_build_vocab(os.path.join(data_dir, 'sherlockholmes.train.txt'))
 
     init_h = [mx.io.DataDesc('LSTM_state', (num_lstm_layer, batch_size, num_hidden), layout='TNC')]
     init_c = [mx.io.DataDesc('LSTM_state_cell', (num_lstm_layer, batch_size, num_hidden), layout='TNC')]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter(os.path.join(data_dir, 'ptb.train.txt'),
+    data_train = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.train.txt'),
                                     vocab, buckets, batch_size, init_states,
                                     time_major=True)
-    data_val = BucketSentenceIter(os.path.join(data_dir, 'ptb.valid.txt'),
+    data_val = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.valid.txt'),
                                   vocab, buckets, batch_size, init_states,
                                   time_major=True)
 
diff --git a/example/rnn/README.md b/example/rnn/README.md
index f0d80c3a61f..a0846fa3da8 100644
--- a/example/rnn/README.md
+++ b/example/rnn/README.md
@@ -8,7 +8,7 @@ Here is a short overview of what is in this directory.
 
 Directory | What's in it?
 --- | ---
-`word_lm/` | Language model trained on the PTB dataset achieving state of the art performance
+`word_lm/` | Language model trained on the Sherlock Holmes dataset achieving state of the art performance
 `bucketing/` | Language model with bucketing API with python
 `bucket_R/` | Language model with bucketing API with R
 `old/` | Language model trained with low level symbol interface (deprecated)
diff --git a/example/rnn/bucketing/README.md b/example/rnn/bucketing/README.md
index b46642bea0a..9bbeefd21e4 100644
--- a/example/rnn/bucketing/README.md
+++ b/example/rnn/bucketing/README.md
@@ -3,13 +3,13 @@ RNN Example
 This folder contains RNN examples using high level mxnet.rnn interface.
 
 ## Data
-1) Review the license for the PenTreeBank dataset and ensure that you agree to it. Then uncomment the lines in the 'get_ptb_data.sh' script that download the dataset.
+1) Review the license for the Sherlock Holmes dataset and ensure that you agree to it. Then uncomment the lines in the 'get_sherlockholmes_data.sh' script that download the dataset.
 
-2) Run `get_ptb_data.sh` to download PenTreeBank data.
+2) Run `get_sherlockholmes_data.sh` to download Sherlock Holmes data.
 
 ## Python
 
-- Generate the PennTreeBank language model by using LSTM:
+- Generate the Sherlock Holmes language model by using LSTM:
 
   For Python2 (CPU support): can take 2+ hours on AWS-EC2-p2.16xlarge
 
@@ -23,11 +23,11 @@ This folder contains RNN examples using high level mxnet.rnn interface.
 
   For Python2 (GPU support only): can take 50+ minutes on AWS-EC2-p2.16xlarge
 
-      $ python  --gpus 0,1,2,3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) 
+      $ python [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) --gpus 0,1,2,3
 
   For Python3 (GPU support only): can take 50+ minutes on AWS-EC2-p2.16xlarge
 
-      $ python3 --gpus 0,1,2,3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) 
+      $ python3 [cudnn_lstm_bucketing.py](cudnn_lstm_bucketing.py) --gpus 0,1,2,3
 
 
 ### Performance Note:
diff --git a/example/rnn/bucketing/cudnn_rnn_bucketing.py b/example/rnn/bucketing/cudnn_rnn_bucketing.py
index 5825290e73e..66d5a55c02c 100644
--- a/example/rnn/bucketing/cudnn_rnn_bucketing.py
+++ b/example/rnn/bucketing/cudnn_rnn_bucketing.py
@@ -19,7 +19,7 @@
 import mxnet as mx
 import argparse
 
-parser = argparse.ArgumentParser(description="Train RNN on Penn Tree Bank",
+parser = argparse.ArgumentParser(description="Train RNN on Sherlock Holmes",
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('--test', default=False, action='store_true',
                     help='whether to do testing instead of training')
@@ -81,9 +81,9 @@ def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
     return sentences, vocab
 
 def get_data(layout):
-    train_sent, vocab = tokenize_text("./data/ptb.train.txt", start_label=start_label,
+    train_sent, vocab = tokenize_text("./data/sherlockholmes.train.txt", start_label=start_label,
                                       invalid_label=invalid_label)
-    val_sent, _ = tokenize_text("./data/ptb.test.txt", vocab=vocab, start_label=start_label,
+    val_sent, _ = tokenize_text("./data/sherlockholmes.test.txt", vocab=vocab, start_label=start_label,
                                 invalid_label=invalid_label)
 
     data_train  = mx.rnn.BucketSentenceIter(train_sent, args.batch_size, buckets=buckets,
diff --git a/example/model-parallel/lstm/get_ptb_data.sh b/example/rnn/bucketing/get_sherlockholmes_data.sh
similarity index 85%
rename from example/model-parallel/lstm/get_ptb_data.sh
rename to example/rnn/bucketing/get_sherlockholmes_data.sh
index 2dc4034a938..43c8669e003 100755
--- a/example/model-parallel/lstm/get_ptb_data.sh
+++ b/example/rnn/bucketing/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/bucketing/lstm_bucketing.py b/example/rnn/bucketing/lstm_bucketing.py
index 0e7f064f007..7f150104f45 100644
--- a/example/rnn/bucketing/lstm_bucketing.py
+++ b/example/rnn/bucketing/lstm_bucketing.py
@@ -20,7 +20,7 @@
 import argparse
 import os
 
-parser = argparse.ArgumentParser(description="Train RNN on Penn Tree Bank",
+parser = argparse.ArgumentParser(description="Train RNN on Sherlock Holmes",
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('--num-layers', type=int, default=2,
                     help='number of stacked RNN layers')
@@ -50,7 +50,7 @@
 
 def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
     if not os.path.isfile(fname):
-        raise IOError("Please use get_ptb_data.sh to download requied file (data/ptb.train.txt)")
+        raise IOError("Please use get_sherlockholmes_data.sh to download requied file (data/sherlockholmes.train.txt)")
     lines = open(fname).readlines()
     lines = [filter(None, i.split(' ')) for i in lines]
     sentences, vocab = mx.rnn.encode_sentences(lines, vocab=vocab, invalid_label=invalid_label,
@@ -71,9 +71,9 @@ def tokenize_text(fname, vocab=None, invalid_label=-1, start_label=0):
     start_label = 1
     invalid_label = 0
 
-    train_sent, vocab = tokenize_text("./data/ptb.train.txt", start_label=start_label,
+    train_sent, vocab = tokenize_text("./data/sherlockholmes.train.txt", start_label=start_label,
                                       invalid_label=invalid_label)
-    val_sent, _ = tokenize_text("./data/ptb.test.txt", vocab=vocab, start_label=start_label,
+    val_sent, _ = tokenize_text("./data/sherlockholmes.test.txt", vocab=vocab, start_label=start_label,
                                 invalid_label=invalid_label)
 
     data_train  = mx.rnn.BucketSentenceIter(train_sent, args.batch_size, buckets=buckets,
@@ -92,7 +92,7 @@ def sym_gen(seq_len):
                                  output_dim=args.num_embed, name='embed')
 
         stack.reset()
-        outputs, states = stack.unroll(seq_len, inputs=embed, merge_outputs=True)
+        outputs = stack.unroll(seq_len, inputs=embed, merge_outputs=True)[0]
 
         pred = mx.sym.Reshape(outputs, shape=(-1, args.num_hidden))
         pred = mx.sym.FullyConnected(data=pred, num_hidden=len(vocab), name='pred')
diff --git a/example/rnn/old/README.md b/example/rnn/old/README.md
index c03b36a9d84..5d73523dd96 100644
--- a/example/rnn/old/README.md
+++ b/example/rnn/old/README.md
@@ -3,14 +3,14 @@ RNN Example
 This folder contains RNN examples using low level symbol interface.
 
 ## Data
-Run `get_ptb_data.sh` to download PenTreeBank data.
+Run `get_sherlockholmes_data.sh` to download Sherlock Holmes data.
 
 ## Python
 
 - [lstm.py](lstm.py) Functions for building a LSTM Network
 - [gru.py](gru.py) Functions for building a GRU Network
-- [lstm_bucketing.py](lstm_bucketing.py) PennTreeBank language model by using LSTM
-- [gru_bucketing.py](gru_bucketing.py) PennTreeBank language model by using GRU
+- [lstm_bucketing.py](lstm_bucketing.py) Sherlock Holmes language model by using LSTM
+- [gru_bucketing.py](gru_bucketing.py) Sherlock Holmes language model by using GRU
 - [char-rnn.ipynb](char-rnn.ipynb) Notebook to demo how to train a character LSTM by using ```lstm.py```
 
 
diff --git a/example/rnn/old/bucket_io.py b/example/rnn/old/bucket_io.py
index 21f96ef196f..96c63fbbf44 100644
--- a/example/rnn/old/bucket_io.py
+++ b/example/rnn/old/bucket_io.py
@@ -202,7 +202,7 @@ def make_data_iter_plan(self):
         # truncate each bucket into multiple of batch-size
         bucket_n_batches = []
         for i in range(len(self.data)):
-            bucket_n_batches.append(len(self.data[i]) / self.batch_size)
+            bucket_n_batches.append(np.floor((self.data[i]) / self.batch_size))
             self.data[i] = self.data[i][:int(bucket_n_batches[i]*self.batch_size)]
 
         bucket_plan = np.hstack([np.zeros(n, int)+i for i, n in enumerate(bucket_n_batches)])
diff --git a/example/rnn/bucketing/get_ptb_data.sh b/example/rnn/old/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn/bucketing/get_ptb_data.sh
rename to example/rnn/old/get_sherlockholmes_data.sh
index 2dc4034a938..43c8669e003 100755
--- a/example/rnn/bucketing/get_ptb_data.sh
+++ b/example/rnn/old/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/old/gru_bucketing.py b/example/rnn/old/gru_bucketing.py
index 226018c0268..b9f651a90dc 100644
--- a/example/rnn/old/gru_bucketing.py
+++ b/example/rnn/old/gru_bucketing.py
@@ -23,7 +23,7 @@
 import mxnet as mx
 
 from gru import gru_unroll
-from bucket_io import BucketSentenceIter, default_build_vocab
+from bucket_io import BucketSentenceIter, default_build_vocab, DummyIter
 
 def Perplexity(label, pred):
     label = label.T.reshape((-1,))
@@ -51,7 +51,7 @@ def Perplexity(label, pred):
     #contexts = [mx.context.gpu(i) for i in range(1)]
     contexts = mx.context.cpu()
 
-    vocab = default_build_vocab("./data/ptb.train.txt")
+    vocab = default_build_vocab("./data/sherlockholmes.train.txt")
 
     def sym_gen(seq_len):
         return gru_unroll(num_lstm_layer, seq_len, len(vocab),
@@ -60,9 +60,9 @@ def sym_gen(seq_len):
 
     init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
 
-    data_train = BucketSentenceIter("./data/ptb.train.txt", vocab,
+    data_train = BucketSentenceIter("./data/sherlockholmes.train.txt", vocab,
                                     buckets, batch_size, init_h)
-    data_val = BucketSentenceIter("./data/ptb.valid.txt", vocab,
+    data_val = BucketSentenceIter("./data/sherlockholmes.valid.txt", vocab,
                                   buckets, batch_size, init_h)
 
     if dummy_data:
diff --git a/example/rnn/old/lstm_bucketing.py b/example/rnn/old/lstm_bucketing.py
index 3e3494776dc..0fe4116250a 100644
--- a/example/rnn/old/lstm_bucketing.py
+++ b/example/rnn/old/lstm_bucketing.py
@@ -23,7 +23,7 @@
 import mxnet as mx
 
 from lstm import lstm_unroll
-from bucket_io import BucketSentenceIter, default_build_vocab
+from bucket_io import BucketSentenceIter, default_build_vocab, DummyIter
 
 def Perplexity(label, pred):
     label = label.T.reshape((-1,))
@@ -51,7 +51,7 @@ def Perplexity(label, pred):
 
     contexts = [mx.context.gpu(i) for i in range(N)]
 
-    vocab = default_build_vocab("./data/ptb.train.txt")
+    vocab = default_build_vocab("./data/sherlockholmes.train.txt")
 
     def sym_gen(seq_len):
         return lstm_unroll(num_lstm_layer, seq_len, len(vocab),
@@ -62,9 +62,9 @@ def sym_gen(seq_len):
     init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter("./data/ptb.train.txt", vocab,
+    data_train = BucketSentenceIter("./data/sherlockholmes.train.txt", vocab,
                                     buckets, batch_size, init_states)
-    data_val = BucketSentenceIter("./data/ptb.valid.txt", vocab,
+    data_val = BucketSentenceIter("./data/sherlockholmes.valid.txt", vocab,
                                   buckets, batch_size, init_states)
 
     if dummy_data:
diff --git a/example/rnn/old/lstm_ptb.R b/example/rnn/old/lstm_sherlockholmes.R
similarity index 92%
rename from example/rnn/old/lstm_ptb.R
rename to example/rnn/old/lstm_sherlockholmes.R
index e8467059f38..11d20394407 100644
--- a/example/rnn/old/lstm_ptb.R
+++ b/example/rnn/old/lstm_sherlockholmes.R
@@ -15,9 +15,9 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# PennTreeBank Language Model using lstm, you can replace mx.lstm by mx.gru/ mx.rnn to use gru/rnn model
+# Sherlock Holmes Language Model using lstm, you can replace mx.lstm by mx.gru/ mx.rnn to use gru/rnn model
 # The data file can be found at:
-# https://github.com/dmlc/web-data/tree/master/mxnet/ptb
+# https://github.com/dmlc/web-data/tree/master/mxnet/sherlockholmes
 require(hash)
 require(mxnet)
 require(stringr
@@ -88,10 +88,10 @@ wd=0.00001
 update.period = 1
 
 
-train <- load.data("./data/ptb.train.txt")
+train <- load.data("./data/sherlockholmes.train.txt")
 X.train <- train$X
 dic <- train$dic
-val <- load.data("./data/ptb.valid.txt", dic)
+val <- load.data("./data/sherlockholmes.valid.txt", dic)
 X.val <- val$X
 dic <- val$dic
 X.train.data <- replicate.data(X.train, seq.len)
diff --git a/example/rnn/old/rnn_cell_demo.py b/example/rnn/old/rnn_cell_demo.py
index 3223e936c37..c5772fa3a5b 100644
--- a/example/rnn/old/rnn_cell_demo.py
+++ b/example/rnn/old/rnn_cell_demo.py
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-"""A simple demo of new RNN cell with PTB language model."""
+"""A simple demo of new RNN cell with sherlockholmes language model."""
 
 import os
 
@@ -50,15 +50,15 @@ def Perplexity(label, pred):
     momentum = 0.0
 
     contexts = [mx.context.gpu(i) for i in range(4)]
-    vocab = default_build_vocab(os.path.join(data_dir, 'ptb.train.txt'))
+    vocab = default_build_vocab(os.path.join(data_dir, 'sherlockholmes.train.txt'))
 
     init_h = [('LSTM_init_h', (batch_size, num_lstm_layer, num_hidden))]
     init_c = [('LSTM_init_c', (batch_size, num_lstm_layer, num_hidden))]
     init_states = init_c + init_h
 
-    data_train = BucketSentenceIter(os.path.join(data_dir, 'ptb.train.txt'),
+    data_train = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.train.txt'),
                                     vocab, buckets, batch_size, init_states)
-    data_val = BucketSentenceIter(os.path.join(data_dir, 'ptb.valid.txt'),
+    data_val = BucketSentenceIter(os.path.join(data_dir, 'sherlockholmes.valid.txt'),
                                   vocab, buckets, batch_size, init_states)
 
     def sym_gen(seq_len):
diff --git a/example/rnn/word_lm/README.md b/example/rnn/word_lm/README.md
index c4980326e45..beed6fc8d89 100644
--- a/example/rnn/word_lm/README.md
+++ b/example/rnn/word_lm/README.md
@@ -1,6 +1,6 @@
 Word Level Language Modeling
 ===========
-This example trains a multi-layer LSTM on Penn Treebank (PTB) language modeling benchmark.
+This example trains a multi-layer LSTM on Sherlock Holmes language modeling benchmark.
 
 The following techniques have been adopted for SOTA results:
 - [LSTM for LM](https://arxiv.org/pdf/1409.2329.pdf)
@@ -10,7 +10,7 @@ The following techniques have been adopted for SOTA results:
 The example requires MXNet built with CUDA.
 
 ## Data
-The PTB data is the processed version from [(Mikolov et al, 2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf):
+The Sherlock Holmes data is a copyright free copy of Sherlock Holmes from[(Project Gutenberg)](http://www.gutenberg.org/cache/epub/1661/pg1661.txt):
 
 ## Usage
 Example runs and the results:
@@ -25,7 +25,7 @@ usage: train.py [-h] [--data DATA] [--emsize EMSIZE] [--nhid NHID]
                 [--batch_size BATCH_SIZE] [--dropout DROPOUT] [--tied]
                 [--bptt BPTT] [--log-interval LOG_INTERVAL] [--seed SEED]
 
-PennTreeBank LSTM Language Model
+Sherlock Holmes LSTM Language Model
 
 optional arguments:
   -h, --help            show this help message and exit
diff --git a/example/rnn/word_lm/get_ptb_data.sh b/example/rnn/word_lm/get_ptb_data.sh
deleted file mode 100755
index 2dc4034a938..00000000000
--- a/example/rnn/word_lm/get_ptb_data.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/usr/bin/env bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-echo
-echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
-read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
-echo
-
-if [ $REPLY != "Y" ]
-then
-    echo "License was not reviewed, aborting script."
-    exit 1
-fi
-
-RNN_DIR=$(cd `dirname $0`; pwd)
-DATA_DIR="${RNN_DIR}/data/"
-
-if [[ ! -d "${DATA_DIR}" ]]; then
-  echo "${DATA_DIR} doesn't exist, will create one";
-  mkdir -p ${DATA_DIR}
-fi
-
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/old/get_ptb_data.sh b/example/rnn/word_lm/get_sherlockholmes_data.sh
similarity index 85%
rename from example/rnn/old/get_ptb_data.sh
rename to example/rnn/word_lm/get_sherlockholmes_data.sh
index 2dc4034a938..43c8669e003 100755
--- a/example/rnn/old/get_ptb_data.sh
+++ b/example/rnn/word_lm/get_sherlockholmes_data.sh
@@ -19,7 +19,7 @@
 
 echo
 echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
 read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
 echo
 
@@ -37,7 +37,7 @@ if [[ ! -d "${DATA_DIR}" ]]; then
   mkdir -p ${DATA_DIR}
 fi
 
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
 wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/example/rnn/word_lm/train.py b/example/rnn/word_lm/train.py
index 53b6bd35f27..aa641358cf9 100644
--- a/example/rnn/word_lm/train.py
+++ b/example/rnn/word_lm/train.py
@@ -24,8 +24,8 @@
 from module import *
 from mxnet.model import BatchEndParam
 
-parser = argparse.ArgumentParser(description='PennTreeBank LSTM Language Model')
-parser.add_argument('--data', type=str, default='./data/ptb.',
+parser = argparse.ArgumentParser(description='Sherlock Holmes LSTM Language Model')
+parser.add_argument('--data', type=str, default='./data/sherlockholmes.',
                     help='location of the data corpus')
 parser.add_argument('--emsize', type=int, default=650,
                     help='size of word embeddings')
diff --git a/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl b/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
index 8976e646500..326e57c5a6c 100755
--- a/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
+++ b/perl-package/AI-MXNet/examples/cudnn_lstm_bucketing.pl
@@ -98,11 +98,11 @@ =head1 SYNOPSIS
 func get_data($layout)
 {
     my ($train_sentences, $vocabulary) = tokenize_text(
-        './data/ptb.train.txt', start_label => $start_label,
+        './data/sherlockholmes.train.txt', start_label => $start_label,
         invalid_label => $invalid_label
     );
     my ($validation_sentences) = tokenize_text(
-        './data/ptb.test.txt', vocab => $vocabulary,
+        './data/sherlockholmes.test.txt', vocab => $vocabulary,
         start_label => $start_label, invalid_label => $invalid_label
     );
     my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/perl-package/AI-MXNet/examples/get_ptb_data.sh b/perl-package/AI-MXNet/examples/get_ptb_data.sh
deleted file mode 100755
index 2dc4034a938..00000000000
--- a/perl-package/AI-MXNet/examples/get_ptb_data.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/usr/bin/env bash
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-echo
-echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
-echo "See https://catalog.ldc.upenn.edu/ldc99t42 for the licensing"
-read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
-echo
-
-if [ $REPLY != "Y" ]
-then
-    echo "License was not reviewed, aborting script."
-    exit 1
-fi
-
-RNN_DIR=$(cd `dirname $0`; pwd)
-DATA_DIR="${RNN_DIR}/data/"
-
-if [[ ! -d "${DATA_DIR}" ]]; then
-  echo "${DATA_DIR} doesn't exist, will create one";
-  mkdir -p ${DATA_DIR}
-fi
-
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.train.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.valid.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/ptb/ptb.test.txt;
-wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh b/perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh
new file mode 100755
index 00000000000..43c8669e003
--- /dev/null
+++ b/perl-package/AI-MXNet/examples/get_sherlockholmes_data.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+echo
+echo "NOTE: To continue, you need to review the licensing of the data sets used by this script"
+echo "See https://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License for the licensing"
+read -p "Please confirm you have reviewed the licensing [Y/n]:" -n 1 -r
+echo
+
+if [ $REPLY != "Y" ]
+then
+    echo "License was not reviewed, aborting script."
+    exit 1
+fi
+
+RNN_DIR=$(cd `dirname $0`; pwd)
+DATA_DIR="${RNN_DIR}/data/"
+
+if [[ ! -d "${DATA_DIR}" ]]; then
+  echo "${DATA_DIR} doesn't exist, will create one";
+  mkdir -p ${DATA_DIR}
+fi
+
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.train.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.valid.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/sherlockholmes/sherlockholmes.test.txt;
+wget -P ${DATA_DIR} https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt;
diff --git a/perl-package/AI-MXNet/examples/lstm_bucketing.pl b/perl-package/AI-MXNet/examples/lstm_bucketing.pl
index e6699d79f0b..3618a62d1fb 100755
--- a/perl-package/AI-MXNet/examples/lstm_bucketing.pl
+++ b/perl-package/AI-MXNet/examples/lstm_bucketing.pl
@@ -44,7 +44,7 @@
 
 =head1 NAME
 
-    lstm_bucketing.pl - Example of training LSTM RNN on Penn Tree Bank data using high level RNN interface
+    lstm_bucketing.pl - Example of training LSTM RNN on Sherlock Holmes data using high level RNN interface
 
 =head1 SYNOPSIS
 
@@ -84,11 +84,11 @@ =head1 SYNOPSIS
 my $invalid_label = 0;
 
 my ($train_sentences, $vocabulary) = tokenize_text(
-    './data/ptb.train.txt', start_label => $start_label,
+    './data/sherlockholmes.train.txt', start_label => $start_label,
     invalid_label => $invalid_label
 );
 my ($validation_sentences) = tokenize_text(
-    './data/ptb.test.txt', vocab => $vocabulary,
+    './data/sherlockholmes.test.txt', vocab => $vocabulary,
     start_label => $start_label, invalid_label => $invalid_label
 );
 my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm b/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
index aa495674faa..423b0aec996 100644
--- a/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
+++ b/perl-package/AI-MXNet/lib/AI/MXNet/Module/Bucketing.pm
@@ -33,11 +33,11 @@ AI::MXNet::Module::Bucketing
     my $invalid_label = 0;
 
     my ($train_sentences, $vocabulary) = tokenize_text(
-        './data/ptb.train.txt', start_label => $start_label,
+        './data/sherlockholmes.train.txt', start_label => $start_label,
         invalid_label => $invalid_label
     );
     my ($validation_sentences) = tokenize_text(
-        './data/ptb.test.txt', vocab => $vocabulary,
+        './data/sherlockholmes.test.txt', vocab => $vocabulary,
         start_label => $start_label, invalid_label => $invalid_label
     );
     my $data_train  = mx->rnn->BucketSentenceIter(
diff --git a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
index f3fe7641e23..44ee6e778d2 100644
--- a/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
+++ b/scala-package/examples/src/main/scala/org/apache/mxnetexamples/rnn/LstmBucketing.scala
@@ -35,9 +35,9 @@ import org.apache.mxnet.module.FitParams
  */
 class LstmBucketing {
   @Option(name = "--data-train", usage = "training set")
-  private val dataTrain: String = "example/rnn/ptb.train.txt"
+  private val dataTrain: String = "example/rnn/sherlockholmes.train.txt"
   @Option(name = "--data-val", usage = "validation set")
-  private val dataVal: String = "example/rnn/ptb.valid.txt"
+  private val dataVal: String = "example/rnn/sherlockholmes.valid.txt"
   @Option(name = "--num-epoch", usage = "the number of training epoch")
   private val numEpoch: Int = 5
   @Option(name = "--gpus", usage = "the gpus will be used, e.g. '0,1,2,3'")


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services