You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by Janardhan <ja...@gmail.com> on 2020/08/17 12:51:01 UTC

Re: [DISCUSS] Apache SystemDS 2.0 Release

Hi,

The following is the status of the MLContext test for algorithms.

1. l2svm, msvm, PCA - scripts are running + results are not equal to R
2. Autoencoder, StepwiseReg - Scripts are not running
3. KMeans, GLM (need to fix R) - No R script

Thank you,
Janardhan

On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com> wrote:

> thanks for the perspective, I think we should be very pragmatic
> regarding languages. Let's stick to DML as our domain-specific language
> with R-like syntax, but add language bindings such as the Python API
> (and others) to seamlessly plug into common data science workflows. A
> similar mind set worked very well in the internals too: Java for nicely
> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
> kernels and native libraries where necessary.
>
> Regards,
> Matthias
>
> On 7/9/2020 3:54 PM, Janardhan wrote:
> > DML - %*% seems more Intuitive compared to @. Let us not change the
> syntax
> > ( our selling point easy porting to R! )
> > Python - no solid opinion
> >
> > - Janardhan
> >
> > On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com> wrote:
> >
> >> for the Python API this is fine, for DML not as we should stick as close
> >> as possible to R syntax. Once we had a pydml syntax too, but this
> >> created lots of inconsistencies and could not use Python as a host
> >> language. So, I think restricting such changes to the Python API is a
> >> good path forward. Other opinions?
> >>
> >> Regards,
> >> Matthias
> >>
> >> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>> Hi all
> >>>
> >>>
> >>> Can i suggest a radical change of matrix multiply.
> >>> to change the command from %*% to @.
> >>>
> >>> Python has made this commitment!
> >>>
> >>>
> >>> https://www.python.org/dev/peps/pep-0465/
> >>>
> >>>
> >>> or at least change this in the python API?
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Sebastian
> >>>
> >>> ________________________________
> >>> From: Matthias Boehm <mb...@gmail.com>
> >>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>> To: dev@systemds.apache.org
> >>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>
> >>> Hi all,
> >>>
> >>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
> release
> >>> (feature freeze August 21). This should gives us enough time to figure
> >>> out the list of things that still should go into this release as it's
> an
> >>> opportunity of a major for changes of external behavior. However, as
> >>> it's the first SystemDS Apache release, I think we should still stick
> to
> >>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
> >>> subsequent releases. So, what do you think and any major features you'd
> >>> like to see complete for 2.0?
> >>>
> >>> Regards,
> >>> Matthias
> >>>
> >>
> >
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Janardhan <ja...@gmail.com>.
Hi Arnab,

our team have contributed the following:

1. Thoroughly documented the builtin functions
2. Starter template for working with databricks and colab

Thank you,
Janardhan

On Mon, Sep 7, 2020 at 6:59 PM Shafaq Siddiqi <sh...@tugraz.at>
wrote:

> Hi Arnab,
>
> The changes contributed by me are followings,
>
> Built-ins:
>     -   dropInvalidLength() and dropInvalidType(): frame built-ins for
> data cleaning using schema and length information.
>     -   glm(): Generalized Linear Model added as a built-in from our
> algorithms.
>     -   imputeFD(): for missing value imputation using robust functional
> dependencies.
>     -   Update in an existing built-in MICE (now works on matrices
> instead of frames).
>     -   map() for supporting lambda expressions.
>     -   smote(): an ovesampling technique for class imbalance.
>     -   na_locf(): built-in for forward and backward NA filling.
>     -   gmm(): Gaussian mixture model  (experimental feature)
>
> Binary Operations:
>    -   Comparison operations for frame-frame ops.
>
> Feel free to make any changes you deem necessary.
>
> Best Regards,
> Shafaq Siddiqi
>
> On 9/7/2020 9:51 AM, Baunsgaard, Sebastian wrote:
> > Hi Arnab,
> >
> > Here is my list, feel free to remove elements 😊
> >
> > Major:
> >
> > - Refactor Compression package and add functions
> >    - add Quanization for lossy compression
> >    - Generalize column groups to use same base dictionary
> >    - Binary cell operations
> >    - Left Matrix Multiplication
> > - GitHub actions for automated testing
> > - Improved Compile times, and packaging
> > - Docker containers for systemds, pythonsystemds and testingsystemds
> >
> > Minor:
> >
> > - python PCA and MultiLogReg algorithms
> > - parallel sort
> > - parallel detect schema
> > - Url handler for federated
> > - Distinct values count / estimation function
> > - Simplified Log4J from being Hadoop based to our own
> > - Handle NaStrings in CSV reading frame and matrix
> > - Re-enable code coverage tools
> >
> > Removed
> >
> > - GitHub pages, for documentation and moved to master
> > - Travis testing
> >
> >
> > Best regards
> >
> > Sebastian
> >
> > ________________________________
> > From: arnab phani <ph...@gmail.com>
> > Sent: Monday, September 7, 2020 9:26:12 AM
> > To: dev@systemds.apache.org
> > Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release
> >
> > Thanks Kevin.
> >
> > Other committers: once you get a chance, please send me your
> contributions
> > too.
> >
> > Regards,
> > Arnab..
> >
> > On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner <
> > innerebner@student.tugraz.at> wrote:
> >
> >> Hi,
> >>
> >> here are the changes I contributed after March 24:
> >>
> >> - Added SystemDSContext to python api (now necessary for operations)
> >>
> >> - Added federated frames
> >>
> >> - Federated transform-encode, -decode and -apply (missing value
> >> imputation is still an ongoing PR, I think it will be merged in before
> >> release)
> >>
> >> - New builtin `colnames()` to get the column names of a frame
> >>
> >> That should be everything from my side.
> >>
> >> Regards,
> >> Kevin
> >>
> >> On 9/1/20 11:36 AM, arnab phani wrote:
> >>> Hi All,
> >>>
> >>> As we are nearing the release, I am starting to focus on the release
> >> notes.
> >>> Notes for SystemDS 2.0 release should consolidate all the things that
> >>> happened since Aug 2018 (last SystemML release).
> >>> While I will aggregate the notes from two SystemDS releases, it will be
> >>> great if you can update me with a few lines summarizing the additions
> to
> >>> your features (including the external contributions), especially after
> >>> March 24, 2020 (last SystemDS release).
> >>>
> >>> Once ready, I will share for everyone to have a look.
> >>>
> >>> Regards,
> >>> Arnab..
> >>>
> >>> On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com>
> >> wrote:
> >>>> thanks Arnab for looking over the remaining open issues. Together with
> >>>> Shafaq, we just came across two additional bugs related to eval
> function
> >>>> calls. Theses fixes should go into the RC and I intend to fix them as
> >>>> soon as possible.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 8/27/2020 8:41 PM, arnab phani wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> Currently, I see only a few issues are flagged for 2.0 release. Can
> you
> >>>>> please go through your open issues and check if the Fix-Version is
> set?
> >>>>> Also, if a JIRA task doesn't exist for something you are working on
> or
> >>>> want
> >>>>> to have in the coming release, please open a task and flag it for
> 2.0.
> >>>>>
> >>>>> Regards,
> >>>>> Arnab..
> >>>>>
> >>>>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
> >>>> wrote:
> >>>>>> as the target release date end of August comes closer, I'd like to
> >> share
> >>>>>> that Arnab Phani kindly volunteered in an offline discussion to act
> as
> >>>>>> the release manager for our 2.0 release.
> >>>>>>
> >>>>>> Please, flag issues and features you think are important for the 2.0
> >>>>>> release as such in JIRA so we can monitor them, discuss them on a
> case
> >>>>>> by case basis, and push the release date if necessary. Thanks.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Matthias
> >>>>>>
> >>>>>> On 8/17/2020 2:51 PM, Janardhan wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> The following is the status of the MLContext test for algorithms.
> >>>>>>>
> >>>>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal
> to
> >> R
> >>>>>>> 2. Autoencoder, StepwiseReg - Scripts are not running
> >>>>>>> 3. KMeans, GLM (need to fix R) - No R script
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>> Janardhan
> >>>>>>>
> >>>>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
> >>>>>> wrote:
> >>>>>>>> thanks for the perspective, I think we should be very pragmatic
> >>>>>>>> regarding languages. Let's stick to DML as our domain-specific
> >>>> language
> >>>>>>>> with R-like syntax, but add language bindings such as the Python
> API
> >>>>>>>> (and others) to seamlessly plug into common data science
> workflows.
> >> A
> >>>>>>>> similar mind set worked very well in the internals too: Java for
> >>>> nicely
> >>>>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and
> CUDA
> >>>>>>>> kernels and native libraries where necessary.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Matthias
> >>>>>>>>
> >>>>>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
> >>>>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change
> the
> >>>>>>>> syntax
> >>>>>>>>> ( our selling point easy porting to R! )
> >>>>>>>>> Python - no solid opinion
> >>>>>>>>>
> >>>>>>>>> - Janardhan
> >>>>>>>>>
> >>>>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
> >>>> wrote:
> >>>>>>>>>> for the Python API this is fine, for DML not as we should stick
> as
> >>>>>> close
> >>>>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but
> this
> >>>>>>>>>> created lots of inconsistencies and could not use Python as a
> host
> >>>>>>>>>> language. So, I think restricting such changes to the Python API
> >> is
> >>>> a
> >>>>>>>>>> good path forward. Other opinions?
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Matthias
> >>>>>>>>>>
> >>>>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>>>>>>>>>> Hi all
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Can i suggest a radical change of matrix multiply.
> >>>>>>>>>>> to change the command from %*% to @.
> >>>>>>>>>>>
> >>>>>>>>>>> Python has made this commitment!
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> https://www.python.org/dev/peps/pep-0465/
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> or at least change this in the python API?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards
> >>>>>>>>>>>
> >>>>>>>>>>> Sebastian
> >>>>>>>>>>>
> >>>>>>>>>>> ________________________________
> >>>>>>>>>>> From: Matthias Boehm <mb...@gmail.com>
> >>>>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>>>>>>>>>> To: dev@systemds.apache.org
> >>>>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS
> 2.0
> >>>>>>>> release
> >>>>>>>>>>> (feature freeze August 21). This should gives us enough time to
> >>>>>> figure
> >>>>>>>>>>> out the list of things that still should go into this release
> as
> >>>> it's
> >>>>>>>> an
> >>>>>>>>>>> opportunity of a major for changes of external behavior.
> However,
> >>>> as
> >>>>>>>>>>> it's the first SystemDS Apache release, I think we should still
> >>>> stick
> >>>>>>>> to
> >>>>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK
> >> for
> >>>>>>>>>>> subsequent releases. So, what do you think and any major
> features
> >>>>>> you'd
> >>>>>>>>>>> like to see complete for 2.0?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Matthias
> >>>>>>>>>>>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Shafaq Siddiqi <sh...@tugraz.at>.
Hi Arnab,

The changes contributed by me are followings,

Built-ins:
    -   dropInvalidLength() and dropInvalidType(): frame built-ins for 
data cleaning using schema and length information.
    -   glm(): Generalized Linear Model added as a built-in from our 
algorithms.
    -   imputeFD(): for missing value imputation using robust functional 
dependencies.
    -   Update in an existing built-in MICE (now works on matrices 
instead of frames).
    -   map() for supporting lambda expressions.
    -   smote(): an ovesampling technique for class imbalance.
    -   na_locf(): built-in for forward and backward NA filling.
    -   gmm(): Gaussian mixture model  (experimental feature)

Binary Operations:
   -   Comparison operations for frame-frame ops.

Feel free to make any changes you deem necessary.

Best Regards,
Shafaq Siddiqi

On 9/7/2020 9:51 AM, Baunsgaard, Sebastian wrote:
> Hi Arnab,
>
> Here is my list, feel free to remove elements 😊
>
> Major:
>
> - Refactor Compression package and add functions
>    - add Quanization for lossy compression
>    - Generalize column groups to use same base dictionary
>    - Binary cell operations
>    - Left Matrix Multiplication
> - GitHub actions for automated testing
> - Improved Compile times, and packaging
> - Docker containers for systemds, pythonsystemds and testingsystemds
>
> Minor:
>
> - python PCA and MultiLogReg algorithms
> - parallel sort
> - parallel detect schema
> - Url handler for federated
> - Distinct values count / estimation function
> - Simplified Log4J from being Hadoop based to our own
> - Handle NaStrings in CSV reading frame and matrix
> - Re-enable code coverage tools
>
> Removed
>
> - GitHub pages, for documentation and moved to master
> - Travis testing
>
>
> Best regards
>
> Sebastian
>
> ________________________________
> From: arnab phani <ph...@gmail.com>
> Sent: Monday, September 7, 2020 9:26:12 AM
> To: dev@systemds.apache.org
> Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release
>
> Thanks Kevin.
>
> Other committers: once you get a chance, please send me your contributions
> too.
>
> Regards,
> Arnab..
>
> On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner <
> innerebner@student.tugraz.at> wrote:
>
>> Hi,
>>
>> here are the changes I contributed after March 24:
>>
>> - Added SystemDSContext to python api (now necessary for operations)
>>
>> - Added federated frames
>>
>> - Federated transform-encode, -decode and -apply (missing value
>> imputation is still an ongoing PR, I think it will be merged in before
>> release)
>>
>> - New builtin `colnames()` to get the column names of a frame
>>
>> That should be everything from my side.
>>
>> Regards,
>> Kevin
>>
>> On 9/1/20 11:36 AM, arnab phani wrote:
>>> Hi All,
>>>
>>> As we are nearing the release, I am starting to focus on the release
>> notes.
>>> Notes for SystemDS 2.0 release should consolidate all the things that
>>> happened since Aug 2018 (last SystemML release).
>>> While I will aggregate the notes from two SystemDS releases, it will be
>>> great if you can update me with a few lines summarizing the additions to
>>> your features (including the external contributions), especially after
>>> March 24, 2020 (last SystemDS release).
>>>
>>> Once ready, I will share for everyone to have a look.
>>>
>>> Regards,
>>> Arnab..
>>>
>>> On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com>
>> wrote:
>>>> thanks Arnab for looking over the remaining open issues. Together with
>>>> Shafaq, we just came across two additional bugs related to eval function
>>>> calls. Theses fixes should go into the RC and I intend to fix them as
>>>> soon as possible.
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> On 8/27/2020 8:41 PM, arnab phani wrote:
>>>>> Hi All,
>>>>>
>>>>> Currently, I see only a few issues are flagged for 2.0 release. Can you
>>>>> please go through your open issues and check if the Fix-Version is set?
>>>>> Also, if a JIRA task doesn't exist for something you are working on or
>>>> want
>>>>> to have in the coming release, please open a task and flag it for 2.0.
>>>>>
>>>>> Regards,
>>>>> Arnab..
>>>>>
>>>>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
>>>> wrote:
>>>>>> as the target release date end of August comes closer, I'd like to
>> share
>>>>>> that Arnab Phani kindly volunteered in an offline discussion to act as
>>>>>> the release manager for our 2.0 release.
>>>>>>
>>>>>> Please, flag issues and features you think are important for the 2.0
>>>>>> release as such in JIRA so we can monitor them, discuss them on a case
>>>>>> by case basis, and push the release date if necessary. Thanks.
>>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>> On 8/17/2020 2:51 PM, Janardhan wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> The following is the status of the MLContext test for algorithms.
>>>>>>>
>>>>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to
>> R
>>>>>>> 2. Autoencoder, StepwiseReg - Scripts are not running
>>>>>>> 3. KMeans, GLM (need to fix R) - No R script
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Janardhan
>>>>>>>
>>>>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
>>>>>> wrote:
>>>>>>>> thanks for the perspective, I think we should be very pragmatic
>>>>>>>> regarding languages. Let's stick to DML as our domain-specific
>>>> language
>>>>>>>> with R-like syntax, but add language bindings such as the Python API
>>>>>>>> (and others) to seamlessly plug into common data science workflows.
>> A
>>>>>>>> similar mind set worked very well in the internals too: Java for
>>>> nicely
>>>>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
>>>>>>>> kernels and native libraries where necessary.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Matthias
>>>>>>>>
>>>>>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
>>>>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
>>>>>>>> syntax
>>>>>>>>> ( our selling point easy porting to R! )
>>>>>>>>> Python - no solid opinion
>>>>>>>>>
>>>>>>>>> - Janardhan
>>>>>>>>>
>>>>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
>>>> wrote:
>>>>>>>>>> for the Python API this is fine, for DML not as we should stick as
>>>>>> close
>>>>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
>>>>>>>>>> created lots of inconsistencies and could not use Python as a host
>>>>>>>>>> language. So, I think restricting such changes to the Python API
>> is
>>>> a
>>>>>>>>>> good path forward. Other opinions?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Matthias
>>>>>>>>>>
>>>>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
>>>>>>>>>>> Hi all
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can i suggest a radical change of matrix multiply.
>>>>>>>>>>> to change the command from %*% to @.
>>>>>>>>>>>
>>>>>>>>>>> Python has made this commitment!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.python.org/dev/peps/pep-0465/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> or at least change this in the python API?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>> Sebastian
>>>>>>>>>>>
>>>>>>>>>>> ________________________________
>>>>>>>>>>> From: Matthias Boehm <mb...@gmail.com>
>>>>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
>>>>>>>>>>> To: dev@systemds.apache.org
>>>>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
>>>>>>>> release
>>>>>>>>>>> (feature freeze August 21). This should gives us enough time to
>>>>>> figure
>>>>>>>>>>> out the list of things that still should go into this release as
>>>> it's
>>>>>>>> an
>>>>>>>>>>> opportunity of a major for changes of external behavior. However,
>>>> as
>>>>>>>>>>> it's the first SystemDS Apache release, I think we should still
>>>> stick
>>>>>>>> to
>>>>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK
>> for
>>>>>>>>>>> subsequent releases. So, what do you think and any major features
>>>>>> you'd
>>>>>>>>>>> like to see complete for 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Matthias
>>>>>>>>>>>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by "Baunsgaard, Sebastian" <ba...@tugraz.at>.
Hi Arnab,

Here is my list, feel free to remove elements 😊

Major:

- Refactor Compression package and add functions
  - add Quanization for lossy compression
  - Generalize column groups to use same base dictionary
  - Binary cell operations
  - Left Matrix Multiplication
- GitHub actions for automated testing
- Improved Compile times, and packaging
- Docker containers for systemds, pythonsystemds and testingsystemds

Minor:

- python PCA and MultiLogReg algorithms
- parallel sort
- parallel detect schema
- Url handler for federated
- Distinct values count / estimation function
- Simplified Log4J from being Hadoop based to our own
- Handle NaStrings in CSV reading frame and matrix
- Re-enable code coverage tools

Removed

- GitHub pages, for documentation and moved to master
- Travis testing


Best regards

Sebastian

________________________________
From: arnab phani <ph...@gmail.com>
Sent: Monday, September 7, 2020 9:26:12 AM
To: dev@systemds.apache.org
Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release

Thanks Kevin.

Other committers: once you get a chance, please send me your contributions
too.

Regards,
Arnab..

On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner <
innerebner@student.tugraz.at> wrote:

> Hi,
>
> here are the changes I contributed after March 24:
>
> - Added SystemDSContext to python api (now necessary for operations)
>
> - Added federated frames
>
> - Federated transform-encode, -decode and -apply (missing value
> imputation is still an ongoing PR, I think it will be merged in before
> release)
>
> - New builtin `colnames()` to get the column names of a frame
>
> That should be everything from my side.
>
> Regards,
> Kevin
>
> On 9/1/20 11:36 AM, arnab phani wrote:
> > Hi All,
> >
> > As we are nearing the release, I am starting to focus on the release
> notes.
> > Notes for SystemDS 2.0 release should consolidate all the things that
> > happened since Aug 2018 (last SystemML release).
> > While I will aggregate the notes from two SystemDS releases, it will be
> > great if you can update me with a few lines summarizing the additions to
> > your features (including the external contributions), especially after
> > March 24, 2020 (last SystemDS release).
> >
> > Once ready, I will share for everyone to have a look.
> >
> > Regards,
> > Arnab..
> >
> > On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com>
> wrote:
> >
> >> thanks Arnab for looking over the remaining open issues. Together with
> >> Shafaq, we just came across two additional bugs related to eval function
> >> calls. Theses fixes should go into the RC and I intend to fix them as
> >> soon as possible.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On 8/27/2020 8:41 PM, arnab phani wrote:
> >>> Hi All,
> >>>
> >>> Currently, I see only a few issues are flagged for 2.0 release. Can you
> >>> please go through your open issues and check if the Fix-Version is set?
> >>> Also, if a JIRA task doesn't exist for something you are working on or
> >> want
> >>> to have in the coming release, please open a task and flag it for 2.0.
> >>>
> >>> Regards,
> >>> Arnab..
> >>>
> >>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
> >> wrote:
> >>>> as the target release date end of August comes closer, I'd like to
> share
> >>>> that Arnab Phani kindly volunteered in an offline discussion to act as
> >>>> the release manager for our 2.0 release.
> >>>>
> >>>> Please, flag issues and features you think are important for the 2.0
> >>>> release as such in JIRA so we can monitor them, discuss them on a case
> >>>> by case basis, and push the release date if necessary. Thanks.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 8/17/2020 2:51 PM, Janardhan wrote:
> >>>>> Hi,
> >>>>>
> >>>>> The following is the status of the MLContext test for algorithms.
> >>>>>
> >>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to
> R
> >>>>> 2. Autoencoder, StepwiseReg - Scripts are not running
> >>>>> 3. KMeans, GLM (need to fix R) - No R script
> >>>>>
> >>>>> Thank you,
> >>>>> Janardhan
> >>>>>
> >>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
> >>>> wrote:
> >>>>>> thanks for the perspective, I think we should be very pragmatic
> >>>>>> regarding languages. Let's stick to DML as our domain-specific
> >> language
> >>>>>> with R-like syntax, but add language bindings such as the Python API
> >>>>>> (and others) to seamlessly plug into common data science workflows.
> A
> >>>>>> similar mind set worked very well in the internals too: Java for
> >> nicely
> >>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
> >>>>>> kernels and native libraries where necessary.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Matthias
> >>>>>>
> >>>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
> >>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
> >>>>>> syntax
> >>>>>>> ( our selling point easy porting to R! )
> >>>>>>> Python - no solid opinion
> >>>>>>>
> >>>>>>> - Janardhan
> >>>>>>>
> >>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
> >> wrote:
> >>>>>>>> for the Python API this is fine, for DML not as we should stick as
> >>>> close
> >>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
> >>>>>>>> created lots of inconsistencies and could not use Python as a host
> >>>>>>>> language. So, I think restricting such changes to the Python API
> is
> >> a
> >>>>>>>> good path forward. Other opinions?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Matthias
> >>>>>>>>
> >>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>>>>>>>> Hi all
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can i suggest a radical change of matrix multiply.
> >>>>>>>>> to change the command from %*% to @.
> >>>>>>>>>
> >>>>>>>>> Python has made this commitment!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> https://www.python.org/dev/peps/pep-0465/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> or at least change this in the python API?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best regards
> >>>>>>>>>
> >>>>>>>>> Sebastian
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>> From: Matthias Boehm <mb...@gmail.com>
> >>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>>>>>>>> To: dev@systemds.apache.org
> >>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
> >>>>>> release
> >>>>>>>>> (feature freeze August 21). This should gives us enough time to
> >>>> figure
> >>>>>>>>> out the list of things that still should go into this release as
> >> it's
> >>>>>> an
> >>>>>>>>> opportunity of a major for changes of external behavior. However,
> >> as
> >>>>>>>>> it's the first SystemDS Apache release, I think we should still
> >> stick
> >>>>>> to
> >>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK
> for
> >>>>>>>>> subsequent releases. So, what do you think and any major features
> >>>> you'd
> >>>>>>>>> like to see complete for 2.0?
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Matthias
> >>>>>>>>>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Thanks Kevin.

Other committers: once you get a chance, please send me your contributions
too.

Regards,
Arnab..

On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner <
innerebner@student.tugraz.at> wrote:

> Hi,
>
> here are the changes I contributed after March 24:
>
> - Added SystemDSContext to python api (now necessary for operations)
>
> - Added federated frames
>
> - Federated transform-encode, -decode and -apply (missing value
> imputation is still an ongoing PR, I think it will be merged in before
> release)
>
> - New builtin `colnames()` to get the column names of a frame
>
> That should be everything from my side.
>
> Regards,
> Kevin
>
> On 9/1/20 11:36 AM, arnab phani wrote:
> > Hi All,
> >
> > As we are nearing the release, I am starting to focus on the release
> notes.
> > Notes for SystemDS 2.0 release should consolidate all the things that
> > happened since Aug 2018 (last SystemML release).
> > While I will aggregate the notes from two SystemDS releases, it will be
> > great if you can update me with a few lines summarizing the additions to
> > your features (including the external contributions), especially after
> > March 24, 2020 (last SystemDS release).
> >
> > Once ready, I will share for everyone to have a look.
> >
> > Regards,
> > Arnab..
> >
> > On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com>
> wrote:
> >
> >> thanks Arnab for looking over the remaining open issues. Together with
> >> Shafaq, we just came across two additional bugs related to eval function
> >> calls. Theses fixes should go into the RC and I intend to fix them as
> >> soon as possible.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On 8/27/2020 8:41 PM, arnab phani wrote:
> >>> Hi All,
> >>>
> >>> Currently, I see only a few issues are flagged for 2.0 release. Can you
> >>> please go through your open issues and check if the Fix-Version is set?
> >>> Also, if a JIRA task doesn't exist for something you are working on or
> >> want
> >>> to have in the coming release, please open a task and flag it for 2.0.
> >>>
> >>> Regards,
> >>> Arnab..
> >>>
> >>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
> >> wrote:
> >>>> as the target release date end of August comes closer, I'd like to
> share
> >>>> that Arnab Phani kindly volunteered in an offline discussion to act as
> >>>> the release manager for our 2.0 release.
> >>>>
> >>>> Please, flag issues and features you think are important for the 2.0
> >>>> release as such in JIRA so we can monitor them, discuss them on a case
> >>>> by case basis, and push the release date if necessary. Thanks.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 8/17/2020 2:51 PM, Janardhan wrote:
> >>>>> Hi,
> >>>>>
> >>>>> The following is the status of the MLContext test for algorithms.
> >>>>>
> >>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to
> R
> >>>>> 2. Autoencoder, StepwiseReg - Scripts are not running
> >>>>> 3. KMeans, GLM (need to fix R) - No R script
> >>>>>
> >>>>> Thank you,
> >>>>> Janardhan
> >>>>>
> >>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
> >>>> wrote:
> >>>>>> thanks for the perspective, I think we should be very pragmatic
> >>>>>> regarding languages. Let's stick to DML as our domain-specific
> >> language
> >>>>>> with R-like syntax, but add language bindings such as the Python API
> >>>>>> (and others) to seamlessly plug into common data science workflows.
> A
> >>>>>> similar mind set worked very well in the internals too: Java for
> >> nicely
> >>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
> >>>>>> kernels and native libraries where necessary.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Matthias
> >>>>>>
> >>>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
> >>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
> >>>>>> syntax
> >>>>>>> ( our selling point easy porting to R! )
> >>>>>>> Python - no solid opinion
> >>>>>>>
> >>>>>>> - Janardhan
> >>>>>>>
> >>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
> >> wrote:
> >>>>>>>> for the Python API this is fine, for DML not as we should stick as
> >>>> close
> >>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
> >>>>>>>> created lots of inconsistencies and could not use Python as a host
> >>>>>>>> language. So, I think restricting such changes to the Python API
> is
> >> a
> >>>>>>>> good path forward. Other opinions?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Matthias
> >>>>>>>>
> >>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>>>>>>>> Hi all
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can i suggest a radical change of matrix multiply.
> >>>>>>>>> to change the command from %*% to @.
> >>>>>>>>>
> >>>>>>>>> Python has made this commitment!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> https://www.python.org/dev/peps/pep-0465/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> or at least change this in the python API?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best regards
> >>>>>>>>>
> >>>>>>>>> Sebastian
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>> From: Matthias Boehm <mb...@gmail.com>
> >>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>>>>>>>> To: dev@systemds.apache.org
> >>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
> >>>>>> release
> >>>>>>>>> (feature freeze August 21). This should gives us enough time to
> >>>> figure
> >>>>>>>>> out the list of things that still should go into this release as
> >> it's
> >>>>>> an
> >>>>>>>>> opportunity of a major for changes of external behavior. However,
> >> as
> >>>>>>>>> it's the first SystemDS Apache release, I think we should still
> >> stick
> >>>>>> to
> >>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK
> for
> >>>>>>>>> subsequent releases. So, what do you think and any major features
> >>>> you'd
> >>>>>>>>> like to see complete for 2.0?
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Matthias
> >>>>>>>>>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Kevin Innerebner <in...@student.tugraz.at>.
Hi,

here are the changes I contributed after March 24:

- Added SystemDSContext to python api (now necessary for operations)

- Added federated frames

- Federated transform-encode, -decode and -apply (missing value
imputation is still an ongoing PR, I think it will be merged in before
release)

- New builtin `colnames()` to get the column names of a frame

That should be everything from my side.

Regards,
Kevin

On 9/1/20 11:36 AM, arnab phani wrote:
> Hi All,
>
> As we are nearing the release, I am starting to focus on the release notes.
> Notes for SystemDS 2.0 release should consolidate all the things that
> happened since Aug 2018 (last SystemML release).
> While I will aggregate the notes from two SystemDS releases, it will be
> great if you can update me with a few lines summarizing the additions to
> your features (including the external contributions), especially after
> March 24, 2020 (last SystemDS release).
>
> Once ready, I will share for everyone to have a look.
>
> Regards,
> Arnab..
>
> On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com> wrote:
>
>> thanks Arnab for looking over the remaining open issues. Together with
>> Shafaq, we just came across two additional bugs related to eval function
>> calls. Theses fixes should go into the RC and I intend to fix them as
>> soon as possible.
>>
>> Regards,
>> Matthias
>>
>> On 8/27/2020 8:41 PM, arnab phani wrote:
>>> Hi All,
>>>
>>> Currently, I see only a few issues are flagged for 2.0 release. Can you
>>> please go through your open issues and check if the Fix-Version is set?
>>> Also, if a JIRA task doesn't exist for something you are working on or
>> want
>>> to have in the coming release, please open a task and flag it for 2.0.
>>>
>>> Regards,
>>> Arnab..
>>>
>>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
>> wrote:
>>>> as the target release date end of August comes closer, I'd like to share
>>>> that Arnab Phani kindly volunteered in an offline discussion to act as
>>>> the release manager for our 2.0 release.
>>>>
>>>> Please, flag issues and features you think are important for the 2.0
>>>> release as such in JIRA so we can monitor them, discuss them on a case
>>>> by case basis, and push the release date if necessary. Thanks.
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> On 8/17/2020 2:51 PM, Janardhan wrote:
>>>>> Hi,
>>>>>
>>>>> The following is the status of the MLContext test for algorithms.
>>>>>
>>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to R
>>>>> 2. Autoencoder, StepwiseReg - Scripts are not running
>>>>> 3. KMeans, GLM (need to fix R) - No R script
>>>>>
>>>>> Thank you,
>>>>> Janardhan
>>>>>
>>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
>>>> wrote:
>>>>>> thanks for the perspective, I think we should be very pragmatic
>>>>>> regarding languages. Let's stick to DML as our domain-specific
>> language
>>>>>> with R-like syntax, but add language bindings such as the Python API
>>>>>> (and others) to seamlessly plug into common data science workflows. A
>>>>>> similar mind set worked very well in the internals too: Java for
>> nicely
>>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
>>>>>> kernels and native libraries where necessary.
>>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
>>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
>>>>>> syntax
>>>>>>> ( our selling point easy porting to R! )
>>>>>>> Python - no solid opinion
>>>>>>>
>>>>>>> - Janardhan
>>>>>>>
>>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
>> wrote:
>>>>>>>> for the Python API this is fine, for DML not as we should stick as
>>>> close
>>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
>>>>>>>> created lots of inconsistencies and could not use Python as a host
>>>>>>>> language. So, I think restricting such changes to the Python API is
>> a
>>>>>>>> good path forward. Other opinions?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Matthias
>>>>>>>>
>>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
>>>>>>>>> Hi all
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can i suggest a radical change of matrix multiply.
>>>>>>>>> to change the command from %*% to @.
>>>>>>>>>
>>>>>>>>> Python has made this commitment!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://www.python.org/dev/peps/pep-0465/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> or at least change this in the python API?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> From: Matthias Boehm <mb...@gmail.com>
>>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
>>>>>>>>> To: dev@systemds.apache.org
>>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
>>>>>> release
>>>>>>>>> (feature freeze August 21). This should gives us enough time to
>>>> figure
>>>>>>>>> out the list of things that still should go into this release as
>> it's
>>>>>> an
>>>>>>>>> opportunity of a major for changes of external behavior. However,
>> as
>>>>>>>>> it's the first SystemDS Apache release, I think we should still
>> stick
>>>>>> to
>>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
>>>>>>>>> subsequent releases. So, what do you think and any major features
>>>> you'd
>>>>>>>>> like to see complete for 2.0?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Matthias
>>>>>>>>>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Janardhan <ja...@gmail.com>.
Thanks for working on this Sebastian.

I will review & test this work tomorrow.

Thank you,
Janardhan

On Sat, Sep 26, 2020 at 5:39 PM Baunsgaard, Sebastian <ba...@tugraz.at>
wrote:

> Hi Janardhan,
>
>
> Thanks for pointing this out!
>
> They were not put into master because the examples does not work the same,
>
> I will try to make a reasonable extract of these documents in a PR, that
> fit the current system.
>
> If you want to take a look on that PR it would be much appreciated!
>
>
> thanks again
>
> Sebastian
>
>
> ________________________________
> From: Janardhan <ja...@gmail.com>
> Sent: Friday, September 25, 2020 10:26:00 PM
> To: dev@systemds.apache.org
> Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release
>
> Hi Arnab,
>
> We do not seem to have algorithms documentation.
> For example an equivalent of this[1] page.
>
> Ignore this message, if it exists already.
>
> [1]  http://systemds.apache.org/docs/1.2.0/algorithms-reference.html
>
> Thank you,
> Janardhan
>
> On Fri, Sep 25, 2020 at 3:53 PM arnab phani <ph...@gmail.com> wrote:
>
> > Hi All,
> >
> > Thanks for fixing the remaining issues.
> > I will cut the first release candidates later in the afternoon (CET zone)
> > today.
> >
> > Regards,
> > Arnab..
> >
> > On Tue, Sep 22, 2020 at 2:53 PM arnab phani <ph...@gmail.com>
> wrote:
> >
> > > Hi All,
> > >
> > > Thanks for fixing the bugs.
> > > A few tasks/bugs (see below) are still being worked on and hopefully
> will
> > > be closed in a couple of days.
> > > And we should be all ready to distribute the release candidates by the
> > end
> > > of this week.
> > >
> > > - python tutorials,
> > > - failing gpu tests,
> > > - error in loading native BLAS.
> > >
> > > Regards,
> > > Arnab..
> > >
> > > On Thu, Sep 10, 2020 at 11:24 AM arnab phani <ph...@gmail.com>
> > wrote:
> > >
> > >> Thank you all for the notes.
> > >> Please find the consolidated release notes below, and please let me
> know
> > >> if anything major is missing.
> > >>
> > >> *Release notes for SystemDS 2.0.*
> > >>
> > >> SystemDS 2.0 is the first major release under the new name. This
> release
> > >> contains a major refactoring, a few major features, a large number of
> > >> improvements and fixes, and some experimental features to better
> support
> > >> the end-to-end data science lifecycle. In addition to that, this
> release
> > >> also removes several features that are not up to the mark and
> outdated.
> > >>
> > >> The major changes (compared to SystemML 1.2) include
> > >>
> > >>
> > >>    - New mechanism for DML-bodied (script-level) builtin functions,
> and
> > >>    a wealth of new built-in functions for data preprocessing including
> > data
> > >>    cleaning, augmentation and feature engineering techniques, new ML
> > >>    algorithms, and model debugging.
> > >>    - Several methods for data cleaning have been implemented including
> > >>    multiple imputations with multivariate imputation by chained
> > equations
> > >>    (MICE) and other techniques, SMOTE, an oversampling technique for
> > class
> > >>    imbalance, forward and backward NA filling, cleaning using schema
> and
> > >>    length information, support for outlier detection using standard
> > deviation
> > >>    and inter-quartile range, and functional dependency discovery.
> > >>    - A complete framework for lineage tracing and reuse including
> > >>    support for loop deduplication, full and partial reuse, compiler
> > assisted
> > >>    reuse, several new rewrites to facilitate reuse.
> > >>    - New federated runtime backend including support for federated
> > >>    matrices and frames, federated builtins (transform-encode, decode
> > etc.).
> > >>    - Refactor compression package and add functionalities including
> > >>    quantization for lossy compression, binary cell operations, left
> > matrix
> > >>    multiplication.
> > >>    - New python bindings with supports for several builtins, matrix
> > >>    operations, federated tensors, and lineage traces.
> > >>    - Cuda implementation of cumulative aggregate operators (cumsum,
> > >>    cumprod etc.)
> > >>    - New model debugging technique with slice finder.
> > >>    - New tensor data model (basic tensors of different value types,
> data
> > >>    tensors with schema) [experimental]
> > >>    -  Cloud deployment scripts for AWS and scripts to set up and start
> > >>    federated operations.
> > >>    -  Performance improvements with parallel sort, gpu cum agg, append
> > >>    cbind etc.
> > >>    -  Various compiler and runtime improvements including new and
> > >>    improved rewrites, reduced Spark context creation, new eval
> > framework, list
> > >>    operations, updated native kernel libraries to name a few.
> > >>    - New data reader/writer for json frames and support for sql as a
> > >>    data source.
> > >>    -  Miscellaneous improvements: improved documentation, better
> > >>    testing, run/release scripts, improved packaging, Docker container
> > for
> > >>    systemds, bug fixes.
> > >>    -  Removed MapReduce compiler and runtime backend, pydml parser,
> > >>    Java-UDF framework, script-level debugger.
> > >>
> > >>
> > >> Regards,
> > >> Arnab.
> > >>
> > >>
> > >> On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at>
> > >> wrote:
> > >>
> > >>> On 01.09.20 11:36, arnab phani wrote:
> > >>> > While I will aggregate the notes from two SystemDS releases, it
> will
> > be
> > >>> > great if you can update me with a few lines summarizing the
> additions
> > >>> to
> > >>> > your features (including the external contributions), especially
> > after
> > >>> > March 24, 2020 (last SystemDS release).
> > >>>
> > >>> Hi Arnab!
> > >>>
> > >>> My contributions:
> > >>>
> > >>> - new run script
> > >>> - improve/simplify release scripts
> > >>> - various release related things (improve documentation, fix license
> > >>> headers, clean up pom.xml, etc)
> > >>> - cuda implementation of cumulative aggregate operators (cumsum,
> > >>> cumprod, etc)
> > >>> - bug fixes here and there
> > >>> - maintain native blas support in a working state (now also
> supporting
> > >>> windows)
> > >>> - kmeans builtin dml function
> > >>> - builtins for image augmentation
> > >>>
> > >>> Best,
> > >>> Mark
> > >>>
> > >>
> >
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by "Baunsgaard, Sebastian" <ba...@tugraz.at>.
Hi Janardhan,


Thanks for pointing this out!

They were not put into master because the examples does not work the same,

I will try to make a reasonable extract of these documents in a PR, that fit the current system.

If you want to take a look on that PR it would be much appreciated!


thanks again

Sebastian


________________________________
From: Janardhan <ja...@gmail.com>
Sent: Friday, September 25, 2020 10:26:00 PM
To: dev@systemds.apache.org
Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release

Hi Arnab,

We do not seem to have algorithms documentation.
For example an equivalent of this[1] page.

Ignore this message, if it exists already.

[1]  http://systemds.apache.org/docs/1.2.0/algorithms-reference.html

Thank you,
Janardhan

On Fri, Sep 25, 2020 at 3:53 PM arnab phani <ph...@gmail.com> wrote:

> Hi All,
>
> Thanks for fixing the remaining issues.
> I will cut the first release candidates later in the afternoon (CET zone)
> today.
>
> Regards,
> Arnab..
>
> On Tue, Sep 22, 2020 at 2:53 PM arnab phani <ph...@gmail.com> wrote:
>
> > Hi All,
> >
> > Thanks for fixing the bugs.
> > A few tasks/bugs (see below) are still being worked on and hopefully will
> > be closed in a couple of days.
> > And we should be all ready to distribute the release candidates by the
> end
> > of this week.
> >
> > - python tutorials,
> > - failing gpu tests,
> > - error in loading native BLAS.
> >
> > Regards,
> > Arnab..
> >
> > On Thu, Sep 10, 2020 at 11:24 AM arnab phani <ph...@gmail.com>
> wrote:
> >
> >> Thank you all for the notes.
> >> Please find the consolidated release notes below, and please let me know
> >> if anything major is missing.
> >>
> >> *Release notes for SystemDS 2.0.*
> >>
> >> SystemDS 2.0 is the first major release under the new name. This release
> >> contains a major refactoring, a few major features, a large number of
> >> improvements and fixes, and some experimental features to better support
> >> the end-to-end data science lifecycle. In addition to that, this release
> >> also removes several features that are not up to the mark and outdated.
> >>
> >> The major changes (compared to SystemML 1.2) include
> >>
> >>
> >>    - New mechanism for DML-bodied (script-level) builtin functions, and
> >>    a wealth of new built-in functions for data preprocessing including
> data
> >>    cleaning, augmentation and feature engineering techniques, new ML
> >>    algorithms, and model debugging.
> >>    - Several methods for data cleaning have been implemented including
> >>    multiple imputations with multivariate imputation by chained
> equations
> >>    (MICE) and other techniques, SMOTE, an oversampling technique for
> class
> >>    imbalance, forward and backward NA filling, cleaning using schema and
> >>    length information, support for outlier detection using standard
> deviation
> >>    and inter-quartile range, and functional dependency discovery.
> >>    - A complete framework for lineage tracing and reuse including
> >>    support for loop deduplication, full and partial reuse, compiler
> assisted
> >>    reuse, several new rewrites to facilitate reuse.
> >>    - New federated runtime backend including support for federated
> >>    matrices and frames, federated builtins (transform-encode, decode
> etc.).
> >>    - Refactor compression package and add functionalities including
> >>    quantization for lossy compression, binary cell operations, left
> matrix
> >>    multiplication.
> >>    - New python bindings with supports for several builtins, matrix
> >>    operations, federated tensors, and lineage traces.
> >>    - Cuda implementation of cumulative aggregate operators (cumsum,
> >>    cumprod etc.)
> >>    - New model debugging technique with slice finder.
> >>    - New tensor data model (basic tensors of different value types, data
> >>    tensors with schema) [experimental]
> >>    -  Cloud deployment scripts for AWS and scripts to set up and start
> >>    federated operations.
> >>    -  Performance improvements with parallel sort, gpu cum agg, append
> >>    cbind etc.
> >>    -  Various compiler and runtime improvements including new and
> >>    improved rewrites, reduced Spark context creation, new eval
> framework, list
> >>    operations, updated native kernel libraries to name a few.
> >>    - New data reader/writer for json frames and support for sql as a
> >>    data source.
> >>    -  Miscellaneous improvements: improved documentation, better
> >>    testing, run/release scripts, improved packaging, Docker container
> for
> >>    systemds, bug fixes.
> >>    -  Removed MapReduce compiler and runtime backend, pydml parser,
> >>    Java-UDF framework, script-level debugger.
> >>
> >>
> >> Regards,
> >> Arnab.
> >>
> >>
> >> On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at>
> >> wrote:
> >>
> >>> On 01.09.20 11:36, arnab phani wrote:
> >>> > While I will aggregate the notes from two SystemDS releases, it will
> be
> >>> > great if you can update me with a few lines summarizing the additions
> >>> to
> >>> > your features (including the external contributions), especially
> after
> >>> > March 24, 2020 (last SystemDS release).
> >>>
> >>> Hi Arnab!
> >>>
> >>> My contributions:
> >>>
> >>> - new run script
> >>> - improve/simplify release scripts
> >>> - various release related things (improve documentation, fix license
> >>> headers, clean up pom.xml, etc)
> >>> - cuda implementation of cumulative aggregate operators (cumsum,
> >>> cumprod, etc)
> >>> - bug fixes here and there
> >>> - maintain native blas support in a working state (now also supporting
> >>> windows)
> >>> - kmeans builtin dml function
> >>> - builtins for image augmentation
> >>>
> >>> Best,
> >>> Mark
> >>>
> >>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Janardhan <ja...@gmail.com>.
Hi Arnab,

We do not seem to have algorithms documentation.
For example an equivalent of this[1] page.

Ignore this message, if it exists already.

[1]  http://systemds.apache.org/docs/1.2.0/algorithms-reference.html

Thank you,
Janardhan

On Fri, Sep 25, 2020 at 3:53 PM arnab phani <ph...@gmail.com> wrote:

> Hi All,
>
> Thanks for fixing the remaining issues.
> I will cut the first release candidates later in the afternoon (CET zone)
> today.
>
> Regards,
> Arnab..
>
> On Tue, Sep 22, 2020 at 2:53 PM arnab phani <ph...@gmail.com> wrote:
>
> > Hi All,
> >
> > Thanks for fixing the bugs.
> > A few tasks/bugs (see below) are still being worked on and hopefully will
> > be closed in a couple of days.
> > And we should be all ready to distribute the release candidates by the
> end
> > of this week.
> >
> > - python tutorials,
> > - failing gpu tests,
> > - error in loading native BLAS.
> >
> > Regards,
> > Arnab..
> >
> > On Thu, Sep 10, 2020 at 11:24 AM arnab phani <ph...@gmail.com>
> wrote:
> >
> >> Thank you all for the notes.
> >> Please find the consolidated release notes below, and please let me know
> >> if anything major is missing.
> >>
> >> *Release notes for SystemDS 2.0.*
> >>
> >> SystemDS 2.0 is the first major release under the new name. This release
> >> contains a major refactoring, a few major features, a large number of
> >> improvements and fixes, and some experimental features to better support
> >> the end-to-end data science lifecycle. In addition to that, this release
> >> also removes several features that are not up to the mark and outdated.
> >>
> >> The major changes (compared to SystemML 1.2) include
> >>
> >>
> >>    - New mechanism for DML-bodied (script-level) builtin functions, and
> >>    a wealth of new built-in functions for data preprocessing including
> data
> >>    cleaning, augmentation and feature engineering techniques, new ML
> >>    algorithms, and model debugging.
> >>    - Several methods for data cleaning have been implemented including
> >>    multiple imputations with multivariate imputation by chained
> equations
> >>    (MICE) and other techniques, SMOTE, an oversampling technique for
> class
> >>    imbalance, forward and backward NA filling, cleaning using schema and
> >>    length information, support for outlier detection using standard
> deviation
> >>    and inter-quartile range, and functional dependency discovery.
> >>    - A complete framework for lineage tracing and reuse including
> >>    support for loop deduplication, full and partial reuse, compiler
> assisted
> >>    reuse, several new rewrites to facilitate reuse.
> >>    - New federated runtime backend including support for federated
> >>    matrices and frames, federated builtins (transform-encode, decode
> etc.).
> >>    - Refactor compression package and add functionalities including
> >>    quantization for lossy compression, binary cell operations, left
> matrix
> >>    multiplication.
> >>    - New python bindings with supports for several builtins, matrix
> >>    operations, federated tensors, and lineage traces.
> >>    - Cuda implementation of cumulative aggregate operators (cumsum,
> >>    cumprod etc.)
> >>    - New model debugging technique with slice finder.
> >>    - New tensor data model (basic tensors of different value types, data
> >>    tensors with schema) [experimental]
> >>    -  Cloud deployment scripts for AWS and scripts to set up and start
> >>    federated operations.
> >>    -  Performance improvements with parallel sort, gpu cum agg, append
> >>    cbind etc.
> >>    -  Various compiler and runtime improvements including new and
> >>    improved rewrites, reduced Spark context creation, new eval
> framework, list
> >>    operations, updated native kernel libraries to name a few.
> >>    - New data reader/writer for json frames and support for sql as a
> >>    data source.
> >>    -  Miscellaneous improvements: improved documentation, better
> >>    testing, run/release scripts, improved packaging, Docker container
> for
> >>    systemds, bug fixes.
> >>    -  Removed MapReduce compiler and runtime backend, pydml parser,
> >>    Java-UDF framework, script-level debugger.
> >>
> >>
> >> Regards,
> >> Arnab.
> >>
> >>
> >> On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at>
> >> wrote:
> >>
> >>> On 01.09.20 11:36, arnab phani wrote:
> >>> > While I will aggregate the notes from two SystemDS releases, it will
> be
> >>> > great if you can update me with a few lines summarizing the additions
> >>> to
> >>> > your features (including the external contributions), especially
> after
> >>> > March 24, 2020 (last SystemDS release).
> >>>
> >>> Hi Arnab!
> >>>
> >>> My contributions:
> >>>
> >>> - new run script
> >>> - improve/simplify release scripts
> >>> - various release related things (improve documentation, fix license
> >>> headers, clean up pom.xml, etc)
> >>> - cuda implementation of cumulative aggregate operators (cumsum,
> >>> cumprod, etc)
> >>> - bug fixes here and there
> >>> - maintain native blas support in a working state (now also supporting
> >>> windows)
> >>> - kmeans builtin dml function
> >>> - builtins for image augmentation
> >>>
> >>> Best,
> >>> Mark
> >>>
> >>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Hi All,

Thanks for fixing the remaining issues.
I will cut the first release candidates later in the afternoon (CET zone)
today.

Regards,
Arnab..

On Tue, Sep 22, 2020 at 2:53 PM arnab phani <ph...@gmail.com> wrote:

> Hi All,
>
> Thanks for fixing the bugs.
> A few tasks/bugs (see below) are still being worked on and hopefully will
> be closed in a couple of days.
> And we should be all ready to distribute the release candidates by the end
> of this week.
>
> - python tutorials,
> - failing gpu tests,
> - error in loading native BLAS.
>
> Regards,
> Arnab..
>
> On Thu, Sep 10, 2020 at 11:24 AM arnab phani <ph...@gmail.com> wrote:
>
>> Thank you all for the notes.
>> Please find the consolidated release notes below, and please let me know
>> if anything major is missing.
>>
>> *Release notes for SystemDS 2.0.*
>>
>> SystemDS 2.0 is the first major release under the new name. This release
>> contains a major refactoring, a few major features, a large number of
>> improvements and fixes, and some experimental features to better support
>> the end-to-end data science lifecycle. In addition to that, this release
>> also removes several features that are not up to the mark and outdated.
>>
>> The major changes (compared to SystemML 1.2) include
>>
>>
>>    - New mechanism for DML-bodied (script-level) builtin functions, and
>>    a wealth of new built-in functions for data preprocessing including data
>>    cleaning, augmentation and feature engineering techniques, new ML
>>    algorithms, and model debugging.
>>    - Several methods for data cleaning have been implemented including
>>    multiple imputations with multivariate imputation by chained equations
>>    (MICE) and other techniques, SMOTE, an oversampling technique for class
>>    imbalance, forward and backward NA filling, cleaning using schema and
>>    length information, support for outlier detection using standard deviation
>>    and inter-quartile range, and functional dependency discovery.
>>    - A complete framework for lineage tracing and reuse including
>>    support for loop deduplication, full and partial reuse, compiler assisted
>>    reuse, several new rewrites to facilitate reuse.
>>    - New federated runtime backend including support for federated
>>    matrices and frames, federated builtins (transform-encode, decode etc.).
>>    - Refactor compression package and add functionalities including
>>    quantization for lossy compression, binary cell operations, left matrix
>>    multiplication.
>>    - New python bindings with supports for several builtins, matrix
>>    operations, federated tensors, and lineage traces.
>>    - Cuda implementation of cumulative aggregate operators (cumsum,
>>    cumprod etc.)
>>    - New model debugging technique with slice finder.
>>    - New tensor data model (basic tensors of different value types, data
>>    tensors with schema) [experimental]
>>    -  Cloud deployment scripts for AWS and scripts to set up and start
>>    federated operations.
>>    -  Performance improvements with parallel sort, gpu cum agg, append
>>    cbind etc.
>>    -  Various compiler and runtime improvements including new and
>>    improved rewrites, reduced Spark context creation, new eval framework, list
>>    operations, updated native kernel libraries to name a few.
>>    - New data reader/writer for json frames and support for sql as a
>>    data source.
>>    -  Miscellaneous improvements: improved documentation, better
>>    testing, run/release scripts, improved packaging, Docker container for
>>    systemds, bug fixes.
>>    -  Removed MapReduce compiler and runtime backend, pydml parser,
>>    Java-UDF framework, script-level debugger.
>>
>>
>> Regards,
>> Arnab.
>>
>>
>> On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at>
>> wrote:
>>
>>> On 01.09.20 11:36, arnab phani wrote:
>>> > While I will aggregate the notes from two SystemDS releases, it will be
>>> > great if you can update me with a few lines summarizing the additions
>>> to
>>> > your features (including the external contributions), especially after
>>> > March 24, 2020 (last SystemDS release).
>>>
>>> Hi Arnab!
>>>
>>> My contributions:
>>>
>>> - new run script
>>> - improve/simplify release scripts
>>> - various release related things (improve documentation, fix license
>>> headers, clean up pom.xml, etc)
>>> - cuda implementation of cumulative aggregate operators (cumsum,
>>> cumprod, etc)
>>> - bug fixes here and there
>>> - maintain native blas support in a working state (now also supporting
>>> windows)
>>> - kmeans builtin dml function
>>> - builtins for image augmentation
>>>
>>> Best,
>>> Mark
>>>
>>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Hi All,

Thanks for fixing the bugs.
A few tasks/bugs (see below) are still being worked on and hopefully will
be closed in a couple of days.
And we should be all ready to distribute the release candidates by the end
of this week.

- python tutorials,
- failing gpu tests,
- error in loading native BLAS.

Regards,
Arnab..

On Thu, Sep 10, 2020 at 11:24 AM arnab phani <ph...@gmail.com> wrote:

> Thank you all for the notes.
> Please find the consolidated release notes below, and please let me know
> if anything major is missing.
>
> *Release notes for SystemDS 2.0.*
>
> SystemDS 2.0 is the first major release under the new name. This release
> contains a major refactoring, a few major features, a large number of
> improvements and fixes, and some experimental features to better support
> the end-to-end data science lifecycle. In addition to that, this release
> also removes several features that are not up to the mark and outdated.
>
> The major changes (compared to SystemML 1.2) include
>
>
>    - New mechanism for DML-bodied (script-level) builtin functions, and a
>    wealth of new built-in functions for data preprocessing including data
>    cleaning, augmentation and feature engineering techniques, new ML
>    algorithms, and model debugging.
>    - Several methods for data cleaning have been implemented including
>    multiple imputations with multivariate imputation by chained equations
>    (MICE) and other techniques, SMOTE, an oversampling technique for class
>    imbalance, forward and backward NA filling, cleaning using schema and
>    length information, support for outlier detection using standard deviation
>    and inter-quartile range, and functional dependency discovery.
>    - A complete framework for lineage tracing and reuse including support
>    for loop deduplication, full and partial reuse, compiler assisted reuse,
>    several new rewrites to facilitate reuse.
>    - New federated runtime backend including support for federated
>    matrices and frames, federated builtins (transform-encode, decode etc.).
>    - Refactor compression package and add functionalities including
>    quantization for lossy compression, binary cell operations, left matrix
>    multiplication.
>    - New python bindings with supports for several builtins, matrix
>    operations, federated tensors, and lineage traces.
>    - Cuda implementation of cumulative aggregate operators (cumsum,
>    cumprod etc.)
>    - New model debugging technique with slice finder.
>    - New tensor data model (basic tensors of different value types, data
>    tensors with schema) [experimental]
>    -  Cloud deployment scripts for AWS and scripts to set up and start
>    federated operations.
>    -  Performance improvements with parallel sort, gpu cum agg, append
>    cbind etc.
>    -  Various compiler and runtime improvements including new and
>    improved rewrites, reduced Spark context creation, new eval framework, list
>    operations, updated native kernel libraries to name a few.
>    - New data reader/writer for json frames and support for sql as a data
>    source.
>    -  Miscellaneous improvements: improved documentation, better testing,
>    run/release scripts, improved packaging, Docker container for systemds, bug
>    fixes.
>    -  Removed MapReduce compiler and runtime backend, pydml parser,
>    Java-UDF framework, script-level debugger.
>
>
> Regards,
> Arnab.
>
>
> On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at> wrote:
>
>> On 01.09.20 11:36, arnab phani wrote:
>> > While I will aggregate the notes from two SystemDS releases, it will be
>> > great if you can update me with a few lines summarizing the additions to
>> > your features (including the external contributions), especially after
>> > March 24, 2020 (last SystemDS release).
>>
>> Hi Arnab!
>>
>> My contributions:
>>
>> - new run script
>> - improve/simplify release scripts
>> - various release related things (improve documentation, fix license
>> headers, clean up pom.xml, etc)
>> - cuda implementation of cumulative aggregate operators (cumsum,
>> cumprod, etc)
>> - bug fixes here and there
>> - maintain native blas support in a working state (now also supporting
>> windows)
>> - kmeans builtin dml function
>> - builtins for image augmentation
>>
>> Best,
>> Mark
>>
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Thank you all for the notes.
Please find the consolidated release notes below, and please let me know if
anything major is missing.

*Release notes for SystemDS 2.0.*

SystemDS 2.0 is the first major release under the new name. This release
contains a major refactoring, a few major features, a large number of
improvements and fixes, and some experimental features to better support
the end-to-end data science lifecycle. In addition to that, this release
also removes several features that are not up to the mark and outdated.

The major changes (compared to SystemML 1.2) include


   - New mechanism for DML-bodied (script-level) builtin functions, and a
   wealth of new built-in functions for data preprocessing including data
   cleaning, augmentation and feature engineering techniques, new ML
   algorithms, and model debugging.
   - Several methods for data cleaning have been implemented including
   multiple imputations with multivariate imputation by chained equations
   (MICE) and other techniques, SMOTE, an oversampling technique for class
   imbalance, forward and backward NA filling, cleaning using schema and
   length information, support for outlier detection using standard deviation
   and inter-quartile range, and functional dependency discovery.
   - A complete framework for lineage tracing and reuse including support
   for loop deduplication, full and partial reuse, compiler assisted reuse,
   several new rewrites to facilitate reuse.
   - New federated runtime backend including support for federated matrices
   and frames, federated builtins (transform-encode, decode etc.).
   - Refactor compression package and add functionalities including
   quantization for lossy compression, binary cell operations, left matrix
   multiplication.
   - New python bindings with supports for several builtins, matrix
   operations, federated tensors, and lineage traces.
   - Cuda implementation of cumulative aggregate operators (cumsum, cumprod
   etc.)
   - New model debugging technique with slice finder.
   - New tensor data model (basic tensors of different value types, data
   tensors with schema) [experimental]
   -  Cloud deployment scripts for AWS and scripts to set up and start
   federated operations.
   -  Performance improvements with parallel sort, gpu cum agg, append
   cbind etc.
   -  Various compiler and runtime improvements including new and improved
   rewrites, reduced Spark context creation, new eval framework, list
   operations, updated native kernel libraries to name a few.
   - New data reader/writer for json frames and support for sql as a data
   source.
   -  Miscellaneous improvements: improved documentation, better testing,
   run/release scripts, improved packaging, Docker container for systemds, bug
   fixes.
   -  Removed MapReduce compiler and runtime backend, pydml parser,
   Java-UDF framework, script-level debugger.


Regards,
Arnab.


On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <md...@know-center.at> wrote:

> On 01.09.20 11:36, arnab phani wrote:
> > While I will aggregate the notes from two SystemDS releases, it will be
> > great if you can update me with a few lines summarizing the additions to
> > your features (including the external contributions), especially after
> > March 24, 2020 (last SystemDS release).
>
> Hi Arnab!
>
> My contributions:
>
> - new run script
> - improve/simplify release scripts
> - various release related things (improve documentation, fix license
> headers, clean up pom.xml, etc)
> - cuda implementation of cumulative aggregate operators (cumsum,
> cumprod, etc)
> - bug fixes here and there
> - maintain native blas support in a working state (now also supporting
> windows)
> - kmeans builtin dml function
> - builtins for image augmentation
>
> Best,
> Mark
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Mark Dokter <md...@know-center.at>.
On 01.09.20 11:36, arnab phani wrote:
> While I will aggregate the notes from two SystemDS releases, it will be
> great if you can update me with a few lines summarizing the additions to
> your features (including the external contributions), especially after
> March 24, 2020 (last SystemDS release).

Hi Arnab!

My contributions:

- new run script
- improve/simplify release scripts
- various release related things (improve documentation, fix license 
headers, clean up pom.xml, etc)
- cuda implementation of cumulative aggregate operators (cumsum, 
cumprod, etc)
- bug fixes here and there
- maintain native blas support in a working state (now also supporting 
windows)
- kmeans builtin dml function
- builtins for image augmentation

Best,
Mark

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Hi All,

As we are nearing the release, I am starting to focus on the release notes.
Notes for SystemDS 2.0 release should consolidate all the things that
happened since Aug 2018 (last SystemML release).
While I will aggregate the notes from two SystemDS releases, it will be
great if you can update me with a few lines summarizing the additions to
your features (including the external contributions), especially after
March 24, 2020 (last SystemDS release).

Once ready, I will share for everyone to have a look.

Regards,
Arnab..

On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mb...@gmail.com> wrote:

> thanks Arnab for looking over the remaining open issues. Together with
> Shafaq, we just came across two additional bugs related to eval function
> calls. Theses fixes should go into the RC and I intend to fix them as
> soon as possible.
>
> Regards,
> Matthias
>
> On 8/27/2020 8:41 PM, arnab phani wrote:
> > Hi All,
> >
> > Currently, I see only a few issues are flagged for 2.0 release. Can you
> > please go through your open issues and check if the Fix-Version is set?
> > Also, if a JIRA task doesn't exist for something you are working on or
> want
> > to have in the coming release, please open a task and flag it for 2.0.
> >
> > Regards,
> > Arnab..
> >
> > On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com>
> wrote:
> >
> >> as the target release date end of August comes closer, I'd like to share
> >> that Arnab Phani kindly volunteered in an offline discussion to act as
> >> the release manager for our 2.0 release.
> >>
> >> Please, flag issues and features you think are important for the 2.0
> >> release as such in JIRA so we can monitor them, discuss them on a case
> >> by case basis, and push the release date if necessary. Thanks.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On 8/17/2020 2:51 PM, Janardhan wrote:
> >>> Hi,
> >>>
> >>> The following is the status of the MLContext test for algorithms.
> >>>
> >>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to R
> >>> 2. Autoencoder, StepwiseReg - Scripts are not running
> >>> 3. KMeans, GLM (need to fix R) - No R script
> >>>
> >>> Thank you,
> >>> Janardhan
> >>>
> >>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
> >> wrote:
> >>>
> >>>> thanks for the perspective, I think we should be very pragmatic
> >>>> regarding languages. Let's stick to DML as our domain-specific
> language
> >>>> with R-like syntax, but add language bindings such as the Python API
> >>>> (and others) to seamlessly plug into common data science workflows. A
> >>>> similar mind set worked very well in the internals too: Java for
> nicely
> >>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
> >>>> kernels and native libraries where necessary.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 7/9/2020 3:54 PM, Janardhan wrote:
> >>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
> >>>> syntax
> >>>>> ( our selling point easy porting to R! )
> >>>>> Python - no solid opinion
> >>>>>
> >>>>> - Janardhan
> >>>>>
> >>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com>
> wrote:
> >>>>>
> >>>>>> for the Python API this is fine, for DML not as we should stick as
> >> close
> >>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
> >>>>>> created lots of inconsistencies and could not use Python as a host
> >>>>>> language. So, I think restricting such changes to the Python API is
> a
> >>>>>> good path forward. Other opinions?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Matthias
> >>>>>>
> >>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>>>>>> Hi all
> >>>>>>>
> >>>>>>>
> >>>>>>> Can i suggest a radical change of matrix multiply.
> >>>>>>> to change the command from %*% to @.
> >>>>>>>
> >>>>>>> Python has made this commitment!
> >>>>>>>
> >>>>>>>
> >>>>>>> https://www.python.org/dev/peps/pep-0465/
> >>>>>>>
> >>>>>>>
> >>>>>>> or at least change this in the python API?
> >>>>>>>
> >>>>>>>
> >>>>>>> Best regards
> >>>>>>>
> >>>>>>> Sebastian
> >>>>>>>
> >>>>>>> ________________________________
> >>>>>>> From: Matthias Boehm <mb...@gmail.com>
> >>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>>>>>> To: dev@systemds.apache.org
> >>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
> >>>> release
> >>>>>>> (feature freeze August 21). This should gives us enough time to
> >> figure
> >>>>>>> out the list of things that still should go into this release as
> it's
> >>>> an
> >>>>>>> opportunity of a major for changes of external behavior. However,
> as
> >>>>>>> it's the first SystemDS Apache release, I think we should still
> stick
> >>>> to
> >>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
> >>>>>>> subsequent releases. So, what do you think and any major features
> >> you'd
> >>>>>>> like to see complete for 2.0?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Matthias
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Matthias Boehm <mb...@gmail.com>.
thanks Arnab for looking over the remaining open issues. Together with 
Shafaq, we just came across two additional bugs related to eval function 
calls. Theses fixes should go into the RC and I intend to fix them as 
soon as possible.

Regards,
Matthias

On 8/27/2020 8:41 PM, arnab phani wrote:
> Hi All,
> 
> Currently, I see only a few issues are flagged for 2.0 release. Can you
> please go through your open issues and check if the Fix-Version is set?
> Also, if a JIRA task doesn't exist for something you are working on or want
> to have in the coming release, please open a task and flag it for 2.0.
> 
> Regards,
> Arnab..
> 
> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com> wrote:
> 
>> as the target release date end of August comes closer, I'd like to share
>> that Arnab Phani kindly volunteered in an offline discussion to act as
>> the release manager for our 2.0 release.
>>
>> Please, flag issues and features you think are important for the 2.0
>> release as such in JIRA so we can monitor them, discuss them on a case
>> by case basis, and push the release date if necessary. Thanks.
>>
>> Regards,
>> Matthias
>>
>> On 8/17/2020 2:51 PM, Janardhan wrote:
>>> Hi,
>>>
>>> The following is the status of the MLContext test for algorithms.
>>>
>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to R
>>> 2. Autoencoder, StepwiseReg - Scripts are not running
>>> 3. KMeans, GLM (need to fix R) - No R script
>>>
>>> Thank you,
>>> Janardhan
>>>
>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
>> wrote:
>>>
>>>> thanks for the perspective, I think we should be very pragmatic
>>>> regarding languages. Let's stick to DML as our domain-specific language
>>>> with R-like syntax, but add language bindings such as the Python API
>>>> (and others) to seamlessly plug into common data science workflows. A
>>>> similar mind set worked very well in the internals too: Java for nicely
>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
>>>> kernels and native libraries where necessary.
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> On 7/9/2020 3:54 PM, Janardhan wrote:
>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the
>>>> syntax
>>>>> ( our selling point easy porting to R! )
>>>>> Python - no solid opinion
>>>>>
>>>>> - Janardhan
>>>>>
>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com> wrote:
>>>>>
>>>>>> for the Python API this is fine, for DML not as we should stick as
>> close
>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this
>>>>>> created lots of inconsistencies and could not use Python as a host
>>>>>> language. So, I think restricting such changes to the Python API is a
>>>>>> good path forward. Other opinions?
>>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
>>>>>>> Hi all
>>>>>>>
>>>>>>>
>>>>>>> Can i suggest a radical change of matrix multiply.
>>>>>>> to change the command from %*% to @.
>>>>>>>
>>>>>>> Python has made this commitment!
>>>>>>>
>>>>>>>
>>>>>>> https://www.python.org/dev/peps/pep-0465/
>>>>>>>
>>>>>>>
>>>>>>> or at least change this in the python API?
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Sebastian
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Matthias Boehm <mb...@gmail.com>
>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
>>>>>>> To: dev@systemds.apache.org
>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
>>>> release
>>>>>>> (feature freeze August 21). This should gives us enough time to
>> figure
>>>>>>> out the list of things that still should go into this release as it's
>>>> an
>>>>>>> opportunity of a major for changes of external behavior. However, as
>>>>>>> it's the first SystemDS Apache release, I think we should still stick
>>>> to
>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
>>>>>>> subsequent releases. So, what do you think and any major features
>> you'd
>>>>>>> like to see complete for 2.0?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Matthias
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by arnab phani <ph...@gmail.com>.
Hi All,

Currently, I see only a few issues are flagged for 2.0 release. Can you
please go through your open issues and check if the Fix-Version is set?
Also, if a JIRA task doesn't exist for something you are working on or want
to have in the coming release, please open a task and flag it for 2.0.

Regards,
Arnab..

On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mb...@gmail.com> wrote:

> as the target release date end of August comes closer, I'd like to share
> that Arnab Phani kindly volunteered in an offline discussion to act as
> the release manager for our 2.0 release.
>
> Please, flag issues and features you think are important for the 2.0
> release as such in JIRA so we can monitor them, discuss them on a case
> by case basis, and push the release date if necessary. Thanks.
>
> Regards,
> Matthias
>
> On 8/17/2020 2:51 PM, Janardhan wrote:
> > Hi,
> >
> > The following is the status of the MLContext test for algorithms.
> >
> > 1. l2svm, msvm, PCA - scripts are running + results are not equal to R
> > 2. Autoencoder, StepwiseReg - Scripts are not running
> > 3. KMeans, GLM (need to fix R) - No R script
> >
> > Thank you,
> > Janardhan
> >
> > On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com>
> wrote:
> >
> >> thanks for the perspective, I think we should be very pragmatic
> >> regarding languages. Let's stick to DML as our domain-specific language
> >> with R-like syntax, but add language bindings such as the Python API
> >> (and others) to seamlessly plug into common data science workflows. A
> >> similar mind set worked very well in the internals too: Java for nicely
> >> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
> >> kernels and native libraries where necessary.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On 7/9/2020 3:54 PM, Janardhan wrote:
> >>> DML - %*% seems more Intuitive compared to @. Let us not change the
> >> syntax
> >>> ( our selling point easy porting to R! )
> >>> Python - no solid opinion
> >>>
> >>> - Janardhan
> >>>
> >>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com> wrote:
> >>>
> >>>> for the Python API this is fine, for DML not as we should stick as
> close
> >>>> as possible to R syntax. Once we had a pydml syntax too, but this
> >>>> created lots of inconsistencies and could not use Python as a host
> >>>> language. So, I think restricting such changes to the Python API is a
> >>>> good path forward. Other opinions?
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
> >>>>> Hi all
> >>>>>
> >>>>>
> >>>>> Can i suggest a radical change of matrix multiply.
> >>>>> to change the command from %*% to @.
> >>>>>
> >>>>> Python has made this commitment!
> >>>>>
> >>>>>
> >>>>> https://www.python.org/dev/peps/pep-0465/
> >>>>>
> >>>>>
> >>>>> or at least change this in the python API?
> >>>>>
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>> Sebastian
> >>>>>
> >>>>> ________________________________
> >>>>> From: Matthias Boehm <mb...@gmail.com>
> >>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
> >>>>> To: dev@systemds.apache.org
> >>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
> >> release
> >>>>> (feature freeze August 21). This should gives us enough time to
> figure
> >>>>> out the list of things that still should go into this release as it's
> >> an
> >>>>> opportunity of a major for changes of external behavior. However, as
> >>>>> it's the first SystemDS Apache release, I think we should still stick
> >> to
> >>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
> >>>>> subsequent releases. So, what do you think and any major features
> you'd
> >>>>> like to see complete for 2.0?
> >>>>>
> >>>>> Regards,
> >>>>> Matthias
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Posted by Matthias Boehm <mb...@gmail.com>.
as the target release date end of August comes closer, I'd like to share 
that Arnab Phani kindly volunteered in an offline discussion to act as 
the release manager for our 2.0 release.

Please, flag issues and features you think are important for the 2.0 
release as such in JIRA so we can monitor them, discuss them on a case 
by case basis, and push the release date if necessary. Thanks.

Regards,
Matthias

On 8/17/2020 2:51 PM, Janardhan wrote:
> Hi,
> 
> The following is the status of the MLContext test for algorithms.
> 
> 1. l2svm, msvm, PCA - scripts are running + results are not equal to R
> 2. Autoencoder, StepwiseReg - Scripts are not running
> 3. KMeans, GLM (need to fix R) - No R script
> 
> Thank you,
> Janardhan
> 
> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mb...@gmail.com> wrote:
> 
>> thanks for the perspective, I think we should be very pragmatic
>> regarding languages. Let's stick to DML as our domain-specific language
>> with R-like syntax, but add language bindings such as the Python API
>> (and others) to seamlessly plug into common data science workflows. A
>> similar mind set worked very well in the internals too: Java for nicely
>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA
>> kernels and native libraries where necessary.
>>
>> Regards,
>> Matthias
>>
>> On 7/9/2020 3:54 PM, Janardhan wrote:
>>> DML - %*% seems more Intuitive compared to @. Let us not change the
>> syntax
>>> ( our selling point easy porting to R! )
>>> Python - no solid opinion
>>>
>>> - Janardhan
>>>
>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mb...@gmail.com> wrote:
>>>
>>>> for the Python API this is fine, for DML not as we should stick as close
>>>> as possible to R syntax. Once we had a pydml syntax too, but this
>>>> created lots of inconsistencies and could not use Python as a host
>>>> language. So, I think restricting such changes to the Python API is a
>>>> good path forward. Other opinions?
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote:
>>>>> Hi all
>>>>>
>>>>>
>>>>> Can i suggest a radical change of matrix multiply.
>>>>> to change the command from %*% to @.
>>>>>
>>>>> Python has made this commitment!
>>>>>
>>>>>
>>>>> https://www.python.org/dev/peps/pep-0465/
>>>>>
>>>>>
>>>>> or at least change this in the python API?
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> Sebastian
>>>>>
>>>>> ________________________________
>>>>> From: Matthias Boehm <mb...@gmail.com>
>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM
>>>>> To: dev@systemds.apache.org
>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0
>> release
>>>>> (feature freeze August 21). This should gives us enough time to figure
>>>>> out the list of things that still should go into this release as it's
>> an
>>>>> opportunity of a major for changes of external behavior. However, as
>>>>> it's the first SystemDS Apache release, I think we should still stick
>> to
>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK for
>>>>> subsequent releases. So, what do you think and any major features you'd
>>>>> like to see complete for 2.0?
>>>>>
>>>>> Regards,
>>>>> Matthias
>>>>>
>>>>
>>>
>>
>