You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Mike Dusenberry (JIRA)" <ji...@apache.org> on 2016/09/01 22:22:21 UTC

[jira] [Commented] (SYSTEMML-845) Compare Performance of LeNet Scripts With & Without Using SystemML-NN

    [ https://issues.apache.org/jira/browse/SYSTEMML-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456798#comment-15456798 ] 

Mike Dusenberry commented on SYSTEMML-845:
------------------------------------------

[~mboehm7] Excellent, that fixed the issue!

In general, it would be useful to explore the addition of a debug/explain mode that specifically pointed out which operations were causing issues such as this one.  It would help speed up the process of debugging serious performance regressions such as this one.

> Compare Performance of LeNet Scripts With & Without Using SystemML-NN
> ---------------------------------------------------------------------
>
>                 Key: SYSTEMML-845
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-845
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler
>    Affects Versions: SystemML 0.11
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>         Attachments: convert.dml, lenet-train-spark-explain-recompile-hops.log, lenet-train-spark-explain.log, log08.03.16-1470268602.txt, mnist_lenet-train-spark-explain-recompile-hops.log, mnist_lenet-train-spark-explain.log, perf.sh, run.sh
>
>
> This JIRA issue tracks the comparison of the performance of the LeNet scripts with & without using SystemML-NN.  The goal is that they should have equal performance in terms of both accuracy and time.  Any difference will be indicate areas of engine improvement.
> Scripts:
> * [mnist_lenet-train.dml | https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/mnist_lenet-train.dml] - LeNet script that *does* use the SystemML-NN library.
> * [lenet-train.dml | https://github.com/apache/incubator-systemml/blob/master/scripts/staging/lenet-train.dml] - LeNet script that *does not* use the SystemML-NN library.
> *Current Status - Forced Singlenode:*
> Equal performance when running the scripts in standalone mode with the {{-exec singlenode}} flag, 20GB of memory, and using data inputs in the SystemML binary format -- see {{run.sh}} and {{perf.sh}} for information.
> Results:
> - Run #1:
> || Script | Time (s) | Accuracy ||
> | mnist_lenet-train.dml | 2987.400704441 | 99.32% |
> | lenet-train.dml | 2816.369435579 | 99.28% |
> - Run #2:
> || Script | Time (s) | Accuracy ||
> | mnist_lenet-train.dml | 2847.790531812 | 99.16% |
> | lenet-train.dml | 2950.520494210 | 99.18% |
> So, same accuracy, and same runtime in singlenode mode!
> To fully reproduce, I basically created a directory, placed the two attached bash scripts in it, grabbed a copy of the NN library and placed it into the directory, ran the examples/get_mnist_data.sh script from the library to get the data (placed into examples/data), then used the attached convert.dml to create binary copies of the data for both scripts, then ran run.sh. Also, I copied examples/data to the base directory as well.  Adjust the {{EXEC}} and related variables in {{perf.sh}} to switch between standalone, Spark, memory sizes, explain, stats, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)