You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/02/22 22:02:50 UTC

[GitHub] daveliepmann opened a new issue #14239: Clojure RNN example does not match performance of pretrained model

daveliepmann opened a new issue #14239: Clojure RNN example does not match performance of pretrained model
URL: https://github.com/apache/incubator-mxnet/issues/14239
 
 
   ## Description
   The Clojure RNN/LSTM example does not match the performance expected (or seen in the pre-trained model) when trained for a matching 75 training epochs.
   
   ## Environment info (Required)
   Apologies, I don't have this info, as the test was run on an on-demand AWS machine. (p2.xlarge, AWS Deep Learning AMI for Ubuntu)
   
   I'm using the Clojure contrib package.
   
   ## Error Message:
   N/A
   
   ## Minimum reproducible example
   https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package/examples/rnn
   
   ## Steps to reproduce
   
   1. Run the RNN example that ships with MXNet, but for 75 epochs just like the pretrained model, instead of the 2 used for demonstration
   2. Evaluate the output
   
   The pre-trained model, reportedly trained for 75 epochs on the Obama RNN/LSTM corpus using either the Python or Scala codebase, gives results like:
   >[The joke] of them war that this country dream. The American people can require medical bills and support for a few good service strong-skids.Meping prommastard edemach and John McCain. This
   
   I ran the Clojure example out of the box, changing only the number of epochs, and got results like this:
   >[The joke] thiptolty an whiend iomes funhilurld blonde ursk, wer orl orot. taced.MOuenckses t trora te ththay tioomones cato patorgilor dr isngr irelthes bey omoved. De sletor t, omesitorurieme time ro
   
   From [asking about this in Slack](https://the-asf.slack.com/archives/CEE9X9WN7/p1547638506024500) we identified two possible causes:
   
    - the Clojure example _may_ have different hyperparameters than what the pre-trained model was trained on: for instance, the Scala tutorial [docs](https://mxnet.incubator.apache.org/tutorials/scala/char_lstm.html) say the learning rate is 0.001 (not 0.01) and weight decay is 0.00001 (not 0.0001) However, the current Scala source uses the same values that the Clojure examples have. It's unclear what codebase was used to create the pretrained model, so it's possible there are other differences.
    - [per @gigasquid / Carin](https://the-asf.slack.com/archives/CEE9X9WN7/p1547661974032400), the BucketIterator was not ported over from Scala:
      ```
      ;;; in the case of this fixed bucketing that only uses one bucket size - it is the equivalent of padpadding all sentences to a fixed length.
      ;; we are going to use ndarray-iter for this
      ;; converting the bucketing-iter over to use is todo. We could either push for the example Scala one to be included in the base package and interop with that (which would be nice for other rnn needs too) or hand convert it over ourselves
      ```
      ([source](https://github.com/apache/incubator-mxnet/blob/master/contrib/clojure-package/examples/rnn/src/rnn/train_char_rnn.clj#L70))
   
   
   ## What have you tried to solve it?
   
   1. Ran a replication
   2. Asked in Slack

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services