You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Matthias Boehm <mb...@us.ibm.com> on 2015/12/05 02:16:44 UTC

Runtime package refactoring


Hi all,

just a quick heads-up, I'd like to do a refactoring of our runtime package.
The goals are (1) to separate out all mr-related classes (cleanup), and (2)
to prepare our core matrix block runtime for packaging as an individual jar
which would make it consumable as a small-footprint library. I intend to
make this change mid next week.

Similar to the refactoring from 'com.ibm.bi.dml' to 'org.apache.sysml',
this change would break binary compatibility with existing datasets in
binary format because the class names are persistent in the sequence file
headers. A workaround is to use an old jar to convert your data from the
old binary format to text, and a new jar to convert the text representation
to the new binary format.

Here is the proposed package structure:

org.apache.sysml.runtime
--controlprogram [...]
--core
----matrix
----funobj
----operators
--instructions [...]
--io
--mapred
----data
----hadoopfix
----jobs
----tasks
----sort
--parfor [...]
--transform
--util

Given this structure we could simply package 'core'/'util' and perhaps 'io'
into a separate jar.


Regards,
Matthias

Re: Runtime package refactoring

Posted by Luciano Resende <lu...@gmail.com>.
On Sat, Dec 5, 2015 at 3:17 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

> yes, these changes are all local to 'org.apache.sysml.runtime'. Other
> than binary format incompatibility, there are no other side effects for MR
> or Spark. These changes are primarily a cleanup of a historically grown
> package structure and a preparation step. For now, there will be still just
> one assembly - down the road however, this allows us to create a separate
> artifact of the core runtime library (which is already used by all three
> CP/MR/Spark runtime backends) for external usage too.
>
>
> Regards,
> Matthias
>
> Thanks for the clarification....

And please, when implementing, please follow the steps below to make sure
we don't loose file history.

- Perform the refactor on your own fork (not on apache git)
- Move the files as one git commit
- Do all the file content changes as a second git commit (imports, docs,
javadocs, etc)
- Create a full build to make sure there is no breakages
- Let the team review to make sure we are not loosing history on the files
or something similar.

Thanks

-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Runtime package refactoring

Posted by Matthias Boehm <mb...@us.ibm.com>.
yes, these changes are all local to 'org.apache.sysml.runtime'. Other than
binary format incompatibility, there are no other side effects for MR or
Spark. These changes are primarily a cleanup of a historically grown
package structure and a preparation step. For now, there will be still just
one assembly - down the road however, this allows us to create a separate
artifact of the core runtime library (which is already used by all three
CP/MR/Spark runtime backends) for external usage too.


Regards,
Matthias



From:	Luciano Resende <lu...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	12/05/2015 01:13 PM
Subject:	Re: Runtime package refactoring



On Fri, Dec 4, 2015 at 5:16 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

>
>
> Hi all,
>
> just a quick heads-up, I'd like to do a refactoring of our runtime
package.
> The goals are (1) to separate out all mr-related classes (cleanup), and
(2)
> to prepare our core matrix block runtime for packaging as an individual
jar
> which would make it consumable as a small-footprint library. I intend to
> make this change mid next week.
>
> Similar to the refactoring from 'com.ibm.bi.dml' to 'org.apache.sysml',
> this change would break binary compatibility with existing datasets in
> binary format because the class names are persistent in the sequence file
> headers. A workaround is to use an old jar to convert your data from the
> old binary format to text, and a new jar to convert the text
representation
> to the new binary format.
>
> Here is the proposed package structure:
>
> org.apache.sysml.runtime
> --controlprogram [...]
> --core
> ----matrix
> ----funobj
> ----operators
> --instructions [...]
> --io
>
--mapred
> ----data
> ----hadoopfix
> ----jobs
> ----tasks
> ----sort
> --parfor [...]
> --transform
> --util
>

I am assuming these changes are all under org.apache.sysml.runtime


>
> Given this structure we could simply package 'core'/'util' and perhaps
'io'
> into a separate jar.
>
>
Few Questions:

- What would be the side effects for different runtimes (MR/Spark)
integration ?
- Is this is just a local build modularization issue, and we are still
planning to generate ONE distribution assembly ?


>
> Regards,
> Matthias
>

Also, as we experienced multiple issues with the package refactoring, I
would recommend the following :

- Perform the refactor on your own fork (not on apache git)
- Move the files as one git commit
- Do all the file content changes as a second git commit (imports, docs,
javadocs, etc)
- Create a full build to make sure there is no breakages
- Let the team review to make sure we are not loosing history on the files
or something similar.

Thank you

--
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: Runtime package refactoring

Posted by Luciano Resende <lu...@gmail.com>.
On Fri, Dec 4, 2015 at 5:16 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

>
>
> Hi all,
>
> just a quick heads-up, I'd like to do a refactoring of our runtime package.
> The goals are (1) to separate out all mr-related classes (cleanup), and (2)
> to prepare our core matrix block runtime for packaging as an individual jar
> which would make it consumable as a small-footprint library. I intend to
> make this change mid next week.
>
> Similar to the refactoring from 'com.ibm.bi.dml' to 'org.apache.sysml',
> this change would break binary compatibility with existing datasets in
> binary format because the class names are persistent in the sequence file
> headers. A workaround is to use an old jar to convert your data from the
> old binary format to text, and a new jar to convert the text representation
> to the new binary format.
>
> Here is the proposed package structure:
>
> org.apache.sysml.runtime
> --controlprogram [...]
> --core
> ----matrix
> ----funobj
> ----operators
> --instructions [...]
> --io
>
--mapred
> ----data
> ----hadoopfix
> ----jobs
> ----tasks
> ----sort
> --parfor [...]
> --transform
> --util
>

I am assuming these changes are all under org.apache.sysml.runtime


>
> Given this structure we could simply package 'core'/'util' and perhaps 'io'
> into a separate jar.
>
>
Few Questions:

- What would be the side effects for different runtimes (MR/Spark)
integration ?
- Is this is just a local build modularization issue, and we are still
planning to generate ONE distribution assembly ?


>
> Regards,
> Matthias
>

Also, as we experienced multiple issues with the package refactoring, I
would recommend the following :

- Perform the refactor on your own fork (not on apache git)
- Move the files as one git commit
- Do all the file content changes as a second git commit (imports, docs,
javadocs, etc)
- Create a full build to make sure there is no breakages
- Let the team review to make sure we are not loosing history on the files
or something similar.

Thank you

-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/