You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@systemml.apache.org by du...@gmail.com on 2016/01/25 19:49:19 UTC

Future Release Package Naming & Structure

Hi all,

A discussion regarding the release package structure started on pull request 54 [https://github.com/apache/incubator-systemml/pull/54].  Currently, we have a "distributed" release for running SystemML on a cluster* using Spark or Hadoop, as well as a "standalone" release for running SystemML on a single node with Java (no Spark or Hadoop installation necessary).  Given this, two questions were raised during the discussion:

  1. Should we name our releases as "*-cluster" and "*-standalone", or just distinguish the standalone version as "*" and "*-standalone"?
  2. Should we maintain the two separate releases ("distributed" and "standalone"), or should we move to have one single release with one JAR that works in all environments and execution modes?

The consensus was that there are pros and cons for each option, and that this discussion would be more appropriate for the mailing list.

Thoughts?

Thanks,
- Mike

* Yes, SystemML can still be run in single node execution mode even on Spark or Hadoop.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.

Re: Future Release Package Naming & Structure

Posted by Deron Eriksson <de...@gmail.com>.

Hi,

I would like a solution that is easy for the end user to understand.

Since the 'cluster/distrib' package seems to contain basically a subset of
the files in the 'standalone' package (standalone has the lib directory of
jars with the systemml jar sitting in lib rather than the parent directory,
it has the 'runStandalone' scripts, it has a few more dml scripts, etc), it
would seem to me that we could get rid of the 'cluster/distrib' package and
just have the 'standalone' package, but remove the '-standalone' naming.
This would allow an end user to just download the single built .tar.gz/.zip
without having to choose which .tar.gz/.zip to download. A README inside
could explain how to use SystemML via standalone mode (currently via
runStandalone scripts using lib dir's contents) or Hadoop batch or Spark
batch. Eventually, if possible, it would be nice if all these options
(standalone, hadoop batch, spark batch) could be run via the bin/systemml
sh and bat scripts (something like "bin/systemml -standalone -f
myalgorithm.dml" when prototyping and something like
"bin/systemml -sparkcluster -f myalgorithm.dml" when distributing on a
Spark cluster).

I would favor leaving the systemml jar file itself alone.

So, in summary,
(1) I like the idea of getting rid of 'cluster/distrib' package and
removing '-standalone' naming of other package. Add README explaining how
to use the remaining package for standalone, hadoop batch, and spark batch.
(2) If possible, see if bin/systemml scripts can be modified to allow
execution of standalone, hadoop batch, and spark batch modes via
bin/systemml so that the user can go to one single place to execute
SystemML (both for prototyping locally and scaling the algorithm execution
on a cluster).
(3) Don't alter the systemml jar file.

Anyone else have thoughts?

Deron

On Mon, Jan 25, 2016 at 10:49 AM, <du...@gmail.com> wrote:

> Hi all,
>
> A discussion regarding the release package structure started on pull
> request 54 [https://github.com/apache/incubator-systemml/pull/54].
> Currently, we have a "distributed" release for running SystemML on a
> cluster* using Spark or Hadoop, as well as a "standalone" release for
> running SystemML on a single node with Java (no Spark or Hadoop
> installation necessary).  Given this, two questions were raised during the
> discussion:
>
>   1. Should we name our releases as "*-cluster" and "*-standalone", or
> just distinguish the standalone version as "*" and "*-standalone"?
>   2. Should we maintain the two separate releases ("distributed" and
> "standalone"), or should we move to have one single release with one JAR
> that works in all environments and execution modes?
>
> The consensus was that there are pros and cons for each option, and that
> this discussion would be more appropriate for the mailing list.
>
> Thoughts?
>
> Thanks,
> - Mike
>
> * Yes, SystemML can still be run in single node execution mode even on
> Spark or Hadoop.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>