You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by alok singh <si...@hotmail.com> on 2017/09/21 23:38:40 UTC

[PROPOSAL] R4ML Integration with SystemML

Hi All,

I just  This in ref to the thread using new id so can't reply to thread "[DISCUSS] R-Interface to SystemML"

Thanks Deron and Matthias and Niketon for the feedback.


I will create the official proposal next week and send the details.

I will have some emails now.

Here I am just copy pasting the main points by individuals in the stack order

Matthis
----------
* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

Deron
--------
Perhaps R4ML committers could supply a little more info? For instance:
> 1) Would they like to merge R4ML code into the main SystemML project
> itself? (Currently we have no modules.)
> 2) What would they like to merge?
> 3) If so, how do they propose to do so?
> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
> 6) Documentation is needed (fit in SystemML documentation framework).
> 7) Testing is needed (fit into SystemML testing framework).
> 8) How is this packaged?
>

Niketan
----------

Also, comparing the features of R4ML with that of our Python APIs will be
useful as it might make a stronger case for R4ML.

Alok

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Deron Eriksson <de...@gmail.com>.
>
>
> > 1) Would they like to merge R4ML code into the main SystemML project
> ALOK: In R we have to follow a pattern dir structure. we might be able to
> create more R pacakges. There will be a sub dir in systemML called R or
> something
> in that subdir there will be subdir R4ML (one R pacakge) in future more R
> pacakge as subdir (more details later)
>
> > itself? (Currently we have no modules.)
> > 2) What would they like to merge?
> ALOK see 1)
> > 3) If so, how do they propose to do so?
> ALOK: will explain in future proposal email
>
> > 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> > who would like to volunteer to do this?
> ALOK: I will do majority of work
> > 5) Who will maintain the contributed code? Or who would like to volunteer
> > to do this?
> ALOK: Alok and Brendan will maintain.
> > 6) Documentation is needed (fit in SystemML documentation framework).
> ALOK: as Brendan pointed out R docs are different and we will take care of
> it . it is self contained
>
> > 7) Testing is needed (fit into SystemML testing framework).
> ALOK: testing will usually by the maven system command exec where it just
> calls
> cd <somesubdir> ; ./bin/install_all
> > 8) How is this packaged?
> ALOK:subdir
>
>

I think offering to add R4ML to SystemML and to maintain the codebase is
great. I think that addresses a couple of the main issues (how to get the
code into the project and how to maintain it).

Deron

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
FYI I couldn't reply to older thread so needed to create this thread from new hotmail account





From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:47 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML 
    
@niketan I think it will be diff than python api, this is more of the product where one can execute various R flows plus many helper functions like preprocessor (diff than systemML)

various sampler, various utils


so it being the interface is misnomer


Alok


________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:45 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Sorry missed Matthias second question.


Yes we can plan but based on talking to few R users having just DML execute capability vs R like matrix ops , we can discuss

but currently it is in not in plan


________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:44 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Answers:


* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

ALOK: We will doing development into it . there are open PR already.

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

ALOK: Currently no out of box MLCtx.

There is another proposal of exposing linAlg on R using Matrix class but that is on hold (I had discuss with Matthias)


> 1) Would they like to merge R4ML code into the main SystemML project
ALOK: In R we have to follow a pattern dir structure. we might be able to create more R pacakges. There will be a sub dir in systemML called R or something
in that subdir there will be subdir R4ML (one R pacakge) in future more R pacakge as subdir (more details later)

> itself? (Currently we have no modules.)
> 2) What would they like to merge?
ALOK see 1)
> 3) If so, how do they propose to do so?
ALOK: will explain in future proposal email

> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
ALOK: I will do majority of work
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
ALOK: Alok and Brendan will maintain.
> 6) Documentation is needed (fit in SystemML documentation framework).
ALOK: as Brendan pointed out R docs are different and we will take care of it . it is self contained

> 7) Testing is needed (fit into SystemML testing framework).
ALOK: testing will usually by the maven system command exec where it just calls
cd <somesubdir> ; ./bin/install_all
> 8) How is this packaged?
ALOK:subdir
________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:38 PM
To: dev@systemml.apache.org
Subject: [PROPOSAL] R4ML Integration with SystemML

Hi All,

I just  This in ref to the thread using new id so can't reply to thread "[DISCUSS] R-Interface to SystemML"

Thanks Deron and Matthias and Niketon for the feedback.


I will create the official proposal next week and send the details.

I will have some emails now.

Here I am just copy pasting the main points by individuals in the stack order

Matthis
----------
* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

Deron
--------
Perhaps R4ML committers could supply a little more info? For instance:
> 1) Would they like to merge R4ML code into the main SystemML project
> itself? (Currently we have no modules.)
> 2) What would they like to merge?
> 3) If so, how do they propose to do so?
> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
> 6) Documentation is needed (fit in SystemML documentation framework).
> 7) Testing is needed (fit into SystemML testing framework).
> 8) How is this packaged?
>

Niketan
----------

Also, comparing the features of R4ML with that of our Python APIs will be
useful as it might make a stronger case for R4ML.

Alok
    

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
@niketan I think it will be diff than python api, this is more of the product where one can execute various R flows plus many helper functions like preprocessor (diff than systemML)

various sampler, various utils


so it being the interface is misnomer


Alok


________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:45 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Sorry missed Matthias second question.


Yes we can plan but based on talking to few R users having just DML execute capability vs R like matrix ops , we can discuss

but currently it is in not in plan


________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:44 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Answers:


* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

ALOK: We will doing development into it . there are open PR already.

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

ALOK: Currently no out of box MLCtx.

There is another proposal of exposing linAlg on R using Matrix class but that is on hold (I had discuss with Matthias)


> 1) Would they like to merge R4ML code into the main SystemML project
ALOK: In R we have to follow a pattern dir structure. we might be able to create more R pacakges. There will be a sub dir in systemML called R or something
in that subdir there will be subdir R4ML (one R pacakge) in future more R pacakge as subdir (more details later)

> itself? (Currently we have no modules.)
> 2) What would they like to merge?
ALOK see 1)
> 3) If so, how do they propose to do so?
ALOK: will explain in future proposal email

> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
ALOK: I will do majority of work
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
ALOK: Alok and Brendan will maintain.
> 6) Documentation is needed (fit in SystemML documentation framework).
ALOK: as Brendan pointed out R docs are different and we will take care of it . it is self contained

> 7) Testing is needed (fit into SystemML testing framework).
ALOK: testing will usually by the maven system command exec where it just calls
cd <somesubdir> ; ./bin/install_all
> 8) How is this packaged?
ALOK:subdir
________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:38 PM
To: dev@systemml.apache.org
Subject: [PROPOSAL] R4ML Integration with SystemML

Hi All,

I just  This in ref to the thread using new id so can't reply to thread "[DISCUSS] R-Interface to SystemML"

Thanks Deron and Matthias and Niketon for the feedback.


I will create the official proposal next week and send the details.

I will have some emails now.

Here I am just copy pasting the main points by individuals in the stack order

Matthis
----------
* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

Deron
--------
Perhaps R4ML committers could supply a little more info? For instance:
> 1) Would they like to merge R4ML code into the main SystemML project
> itself? (Currently we have no modules.)
> 2) What would they like to merge?
> 3) If so, how do they propose to do so?
> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
> 6) Documentation is needed (fit in SystemML documentation framework).
> 7) Testing is needed (fit into SystemML testing framework).
> 8) How is this packaged?
>

Niketan
----------

Also, comparing the features of R4ML with that of our Python APIs will be
useful as it might make a stronger case for R4ML.

Alok

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
Sorry missed Matthias second question.


Yes we can plan but based on talking to few R users having just DML execute capability vs R like matrix ops , we can discuss

but currently it is in not in plan


________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:44 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Answers:


* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

ALOK: We will doing development into it . there are open PR already.

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

ALOK: Currently no out of box MLCtx.

There is another proposal of exposing linAlg on R using Matrix class but that is on hold (I had discuss with Matthias)


> 1) Would they like to merge R4ML code into the main SystemML project
ALOK: In R we have to follow a pattern dir structure. we might be able to create more R pacakges. There will be a sub dir in systemML called R or something
in that subdir there will be subdir R4ML (one R pacakge) in future more R pacakge as subdir (more details later)

> itself? (Currently we have no modules.)
> 2) What would they like to merge?
ALOK see 1)
> 3) If so, how do they propose to do so?
ALOK: will explain in future proposal email

> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
ALOK: I will do majority of work
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
ALOK: Alok and Brendan will maintain.
> 6) Documentation is needed (fit in SystemML documentation framework).
ALOK: as Brendan pointed out R docs are different and we will take care of it . it is self contained

> 7) Testing is needed (fit into SystemML testing framework).
ALOK: testing will usually by the maven system command exec where it just calls
cd <somesubdir> ; ./bin/install_all
> 8) How is this packaged?
ALOK:subdir
________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:38 PM
To: dev@systemml.apache.org
Subject: [PROPOSAL] R4ML Integration with SystemML

Hi All,

I just  This in ref to the thread using new id so can't reply to thread "[DISCUSS] R-Interface to SystemML"

Thanks Deron and Matthias and Niketon for the feedback.


I will create the official proposal next week and send the details.

I will have some emails now.

Here I am just copy pasting the main points by individuals in the stack order

Matthis
----------
* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

Deron
--------
Perhaps R4ML committers could supply a little more info? For instance:
> 1) Would they like to merge R4ML code into the main SystemML project
> itself? (Currently we have no modules.)
> 2) What would they like to merge?
> 3) If so, how do they propose to do so?
> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
> 6) Documentation is needed (fit in SystemML documentation framework).
> 7) Testing is needed (fit into SystemML testing framework).
> 8) How is this packaged?
>

Niketan
----------

Also, comparing the features of R4ML with that of our Python APIs will be
useful as it might make a stronger case for R4ML.

Alok

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Frederick R Reiss <fr...@us.ibm.com>.
+1 to Mike's comments below. Having the public API for invoking custom DML
code from R be as close as possible to the Python and Java variants of
MLContext is a good idea and would allow us to have an additional "R" tab
in the MLContext online documentation. The code already checked into R4ML's
sysml.bridge.R is pretty close already, it's just not a public API right
now. And the existing algorithm wrappers do provide a more R-user-friendly
way to call the pre-built algorithms, similar to the way that the existing
mllearn API gives Python users a scipy-like experience.

Fred



From:	dusenberrymw@gmail.com
To:	dev@systemml.apache.org
Date:	09/22/2017 06:24 PM
Subject:	Re: [PROPOSAL] R4ML Integration with SystemML



Adding an R interface to SystemML would be great.  I would suggest, in
agreement with others here, that the MLContext API be exposed in the R4ML
package so that users *could* run arbitrary DML code from R.  Past that, I
wouldn't worry about making the rest exactly compatible with mllearn at
this point.  All languages, and their associated communities, have
different approaches to libraries, so it would make sense that there may be
pieces that are specific to a certain language.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


On Sep 22, 2017, at 3:47 PM, Deron Eriksson <de...@gmail.com>
wrote:

>>
>>>> So I was thinking is it absolutely must have to sync between api?
>>
>> Soft-yes, we should try our best to do so.
>>
>
> There are many benefits both for SystemML users and developers to having
> the APIs be as consistent as possible. Based on user feedback, I know
> Niketan and Glenn did a lot of work recently to make the Python MLContext
> API much more consistent with the Java/Scala MLContext API. I think there
> is an expectation from SystemML users that code that utilizes one API
will
> act in a similar fashion with as few modifications as possible if
migrated
> to a different language.
>
> As an example of a benefit to SystemML developers, if an R MLContext API
is
> consistent with the Scala and Python APIs, an R tab can be added to
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide.html&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=eZRX-H19YWkdiKk5tC6t8Pg1FXMvre9rPBYQXNUzLyc&m=W_mHjb7fdZHSdZ6roUwxKONu0GC00K9DVH5keJY5m_Y&s=vtamURDoh0pVrkaUcVqqtbLH7DWc6XDrh2-6oQHexwc&e=
  and
> most of the MLContext documentation can be reused across the different
> languages. This greatly simplifies the creation and maintenance of
> documentation, which is very important with a project as large as
SystemML.
> In addition, consistency across MLContext APIs in different languages
> simplifies code maintenance since a developer familiar with the API
> features of one language can probably work without too much difficulty on
> one of the other language APIs in the project. This would not be the case
> if the APIs were significantly divergent.
>
> Deron



Re: [PROPOSAL] R4ML Integration with SystemML

Posted by du...@gmail.com.
Adding an R interface to SystemML would be great.  I would suggest, in agreement with others here, that the MLContext API be exposed in the R4ML package so that users *could* run arbitrary DML code from R.  Past that, I wouldn't worry about making the rest exactly compatible with mllearn at this point.  All languages, and their associated communities, have different approaches to libraries, so it would make sense that there may be pieces that are specific to a certain language.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


On Sep 22, 2017, at 3:47 PM, Deron Eriksson <de...@gmail.com> wrote:

>> 
>>>> So I was thinking is it absolutely must have to sync between api?
>> 
>> Soft-yes, we should try our best to do so.
>> 
> 
> There are many benefits both for SystemML users and developers to having
> the APIs be as consistent as possible. Based on user feedback, I know
> Niketan and Glenn did a lot of work recently to make the Python MLContext
> API much more consistent with the Java/Scala MLContext API. I think there
> is an expectation from SystemML users that code that utilizes one API will
> act in a similar fashion with as few modifications as possible if migrated
> to a different language.
> 
> As an example of a benefit to SystemML developers, if an R MLContext API is
> consistent with the Scala and Python APIs, an R tab can be added to
> http://apache.github.io/systemml/spark-mlcontext-programming-guide.html and
> most of the MLContext documentation can be reused across the different
> languages. This greatly simplifies the creation and maintenance of
> documentation, which is very important with a project as large as SystemML.
> In addition, consistency across MLContext APIs in different languages
> simplifies code maintenance since a developer familiar with the API
> features of one language can probably work without too much difficulty on
> one of the other language APIs in the project. This would not be the case
> if the APIs were significantly divergent.
> 
> Deron

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Deron Eriksson <de...@gmail.com>.
>
> >> So I was thinking is it absolutely must have to sync between api?
>
> Soft-yes, we should try our best to do so.
>

There are many benefits both for SystemML users and developers to having
the APIs be as consistent as possible. Based on user feedback, I know
Niketan and Glenn did a lot of work recently to make the Python MLContext
API much more consistent with the Java/Scala MLContext API. I think there
is an expectation from SystemML users that code that utilizes one API will
act in a similar fashion with as few modifications as possible if migrated
to a different language.

As an example of a benefit to SystemML developers, if an R MLContext API is
consistent with the Scala and Python APIs, an R tab can be added to
http://apache.github.io/systemml/spark-mlcontext-programming-guide.html and
most of the MLContext documentation can be reused across the different
languages. This greatly simplifies the creation and maintenance of
documentation, which is very important with a project as large as SystemML.
In addition, consistency across MLContext APIs in different languages
simplifies code maintenance since a developer familiar with the API
features of one language can probably work without too much difficulty on
one of the other language APIs in the project. This would not be the case
if the APIs were significantly divergent.

Deron

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.


________________________________
From: Niketan Pansare <np...@us.ibm.com>
Sent: Friday, September 22, 2017 3:10 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML


>> a) Does it mean you are proposing spliting R4ML into two R-wrapper and R4ML?

I was only suggesting how you ought to stage the PRs into SystemML once the vote passes :)

Alok: Wondering, if we do it and if we have to push to CRAN and there requirement that all dependencies should be in CRAN might be issues.

>> So I was thinking is it absolutely must have to sync between api?

Soft-yes, we should try our best to do so.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>

Niketan Pansare - IBM<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he works on advanced information management systems that include analytics, distributed ...




[Inactive hide details for alok singh ---09/22/2017 02:40:10 PM---see comments Alok: From: Niketan Pansare <np...@us.ibm.com>]alok singh ---09/22/2017 02:40:10 PM---see comments Alok: From: Niketan Pansare <np...@us.ibm.com>

From: alok singh <si...@hotmail.com>
To: "dev@systemml.apache.org" <de...@systemml.apache.org>
Date: 09/22/2017 02:40 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

________________________________




see comments Alok:



From: Niketan Pansare <np...@us.ibm.com>
Sent: Friday, September 22, 2017 2:11 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

>>> As pointed out earlier, R4ML is not just R interface so it is based on the earlier product of IBM on R and it has many product feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have created those wrapper but those are in R and from user point for view it feels that are just writing the R code
If the ultimate goal is to have just MLCtx based R interface than I think it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis

May be we are not on same page.
(a) MLContext is not the only API, but an important one that needs to be supported.
(b) Like R4ML, our mllearn wrappers aim to simplify the usage for the Python users. These wrappers were designed so that if someone wrote a python script that uses scikit-learn or mllib. Then, a simple change from `from sklearn import LogisticRegression`  to `from systemml.mllearn import LogisticRegression` should in principle allow SystemML to be incorporated in their workflow.

Alok:
a) Does it mean you are proposing spliting R4ML into two R-wrapper and R4ML? I think that could be idea one can potentially look into it. I second it. That way one can have pure R wrapper and like mllearn kind of R4ML

b) Currently we can sure expose the MLContext from R as public api but to use all the code involves many convulations to make life easier for R user. For example see code func *execute* *output* *getDF* in https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=
[https://avatars2.githubusercontent.com/u/12959246?v=4&s=400]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=>

aloknsingh/r4ml<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=>
urldefense.proofpoint.com
r4ml - Scalable R for Machine Learning




>> 1) I think it will require a lot of work for scala and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do the coding at R4ML. but I think goals was to merge this project.

I guess the goal is to make SystemML better and more user-friendly. To do that, we have to try our best to keep our APIs across language consistent. I understand it might require lot of work for Scala and Python APIs to be in sync with R4ML API,  but it has to be done.


Since R4ML was designed in isolation with the SystemML project, I am recommending to do a gradual merge of (1) the additional features and (2) features that diverge from SystemML APIs so as to be R friendly; thus, allowing the SystemML community  to comment on them before merging. This also allows the R4ML features that match one-to-one with the Python and Scala APIs to be merged quickly and not be in the PR until we agree to every (1) and (2) features :)

Alok: See the previous comments I like we should explore the idea of splitting the way you splitted mllearn. Still more discussion needed as I see it.  At this stage those changes will require complete change at R4ML to have those.

Another way to think would be that R4ML can be independent package, which eventually be pushed to CRAN.
note that in the spark dev repo. Spark core is there and SparkR is there as seperate dir and python is there as seperate dir

Initially, SparkScala, SparkR and pyspark tried to be in sync but I think now many features are been added which is not causing sync between sparkR and pyspark and similar between SparkScala and SpakR and  PySpark.

So I was thinking is it absolutely must have to sync between api? Since all these will cater to different user.

These are ideas.




Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>

Niketan Pansare - IBM<http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he works on advanced information management systems that include analytics, distributed ...




http://researcher.watson.ibm.com/researcher/photos/3531.jpg

[http://researcher.watson.ibm.com/researcher/photos/3531.jpg]



Niketan Pansare - IBM
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he works on advanced information management systems that include analytics, distributed ...


cid:1__=8FBB0B30DFE367208f9e8a93df938690918c8FB@ alok  singh ---09/22/2017 12:30:51 PM---Here are Niketan's question Thanks for taking time to answer our questions and also for considering

From: alok singh <si...@hotmail.com>
To: "dev@systemml.apache.org" <de...@systemml.apache.org>, "deron@apache.org" <de...@apache.org>
Date: 09/22/2017 12:30 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML





Here are Niketan's question

Thanks for taking time to answer our questions and also for considering to help SystemML community. I have couple more questions:

Niketan:1.
In case there is inconsistency, do you (as R4ML developers) feel comfortable changing R4ML interface to be compatible with our other APIs ? May be you can go over the below two links and imagine adding a corresponding R tab:
- MLContext Programming guide: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e=

apache.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e= >
apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start Spark Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST Example; DataFrame ...



- Algorithm wrappers: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e=

ALOK: Hi Niketan

As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product feature.

Also note that the pure ML Ctx and the cmd options for dml is not ideally allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have created those wrapper but those are in R and from user point for view it feels that are just writing the R code

see some of the examples at

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_tree_master_R4ML_inst_examples&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=r4-fcsboHpxlbVf6KyY7C6ptdLcjmyT2g1hBHuqRa2s&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_inst_examples_r4ml.demo.mlogit.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=ScIkMGbMLlKu7VgjnDI5pDia2L8C3W_9fZXwZBjb7BI&e=

NOTE: that R4ML uses combination of SparkR and DML and R to make user experience best.

If the ultimate goal is to have just MLCtx based R interface than I think it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis

2. Classification - GitHub Pages<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e= >
apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic Regression Description. The MultiLogReg.dml script performs both binomial and multinomial ...




Niketan: 2. Other than providing R interface to SystemML as the above APIs, what additional features/code R4ML plans to add in SystemML ? Just like we want the R API to be functionally complete with our Python and Scala API, we want Python and Scala APIs to  be functionally complete with the R API. So a discussion on supporting the additional features in Python and Scala APIs is required :)

ALOK: as talked in point 1) I think it will require a lot of work for scala and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do the coding at R4ML.

but I think goals was to merge this project.

I think @Fred if he can comment also that would be nice

Thanks
Alok



From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; deron@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Hi

 We (me and Brendan) has been focusing on other things  like journeys apart from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=5kDETV7oPDlZ3OUDHX3lkMp6VxEJB9dUWCX7bZ1c76o&e= ,  it looks

 https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_13631156-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=YbUfZ7ntWQKbF6sqdbPrpVyZpRnB5ZwvnabMDRSyrw0&e=

SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_pull_50&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=fw5g1aTmnaaxg3-r142R9vfQbpvKlAPZPYbqHMe5Y-4&e= ).

 https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_12959246-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=Z7RXGGwxwpayjbVxUMlwBw1v-s03TDqZDeIlo496ITo&e=

[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 · SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to subm...


Deron









Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Niketan Pansare <np...@us.ibm.com>.
>> a) Does it mean you are proposing spliting R4ML into two R-wrapper and
R4ML?

I was only suggesting how you ought to stage the PRs into SystemML once the
vote passes :)

>> So I was thinking is it absolutely must have to sync between api?

Soft-yes, we should try our best to do so.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	alok singh <si...@hotmail.com>
To:	"dev@systemml.apache.org" <de...@systemml.apache.org>
Date:	09/22/2017 02:40 PM
Subject:	Re: [PROPOSAL] R4ML Integration with SystemML




see comments Alok:



From: Niketan Pansare <np...@us.ibm.com>
Sent: Friday, September 22, 2017 2:11 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

>>> As pointed out earlier, R4ML is not just R interface so it is based on
the earlier product of IBM on R and it has many product feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally
allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have
created those wrapper but those are in R and from user point for view it
feels that are just writing the R code
If the ultimate goal is to have just MLCtx based R interface than I think
it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic
Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit
has better apis

May be we are not on same page.
(a) MLContext is not the only API, but an important one that needs to be
supported.
(b) Like R4ML, our mllearn wrappers aim to simplify the usage for the
Python users. These wrappers were designed so that if someone wrote a
python script that uses scikit-learn or mllib. Then, a simple change from
`from sklearn import LogisticRegression`  to `from systemml.mllearn import
LogisticRegression` should in principle allow SystemML to be incorporated
in their workflow.

Alok:
a) Does it mean you are proposing spliting R4ML into two R-wrapper and
R4ML? I think that could be idea one can potentially look into it. I second
it. That way one can have pure R wrapper and like mllearn kind of R4ML

b) Currently we can sure expose the MLContext from R as public api but to
use all the code involves many convulations to make life easier for R user.
For example see code func *execute* *output* *getDF* in
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aloknsingh_r4ml_blob_0d79b3c7975be55989466869fe99ccfd47dd6dc3_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=SivKuY8Zn0LQAmM2UmppEwy4L-lROLYUzT9iYnS4Njg&s=VZv9IEtLnaXzZ3mp1bICD4zRv3SL2VO7b68H0wHTCis&e=


>> 1) I think it will require a lot of work for scala and python api to be
in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to
do the coding at R4ML. but I think goals was to merge this project.

I guess the goal is to make SystemML better and more user-friendly. To do
that, we have to try our best to keep our APIs across language consistent.
I understand it might require lot of work for Scala and Python APIs to be
in sync with R4ML API,  but it has to be done.


Since R4ML was designed in isolation with the SystemML project, I am
recommending to do a gradual merge of (1) the additional features and (2)
features that diverge from SystemML APIs so as to be R friendly; thus,
allowing the SystemML community  to comment on them before merging. This
also allows the R4ML features that match one-to-one with the Python and
Scala APIs to be merged quickly and not be in the PR until we agree to
every (1) and (2) features :)

Alok: See the previous comments I like we should explore the idea of
splitting the way you splitted mllearn. Still more discussion needed as I
see it.  At this stage those changes will require complete change at R4ML
to have those.

Another way to think would be that R4ML can be independent package, which
eventually be pushed to CRAN.
note that in the spark dev repo. Spark core is there and SparkR is there as
seperate dir and python is there as seperate dir

Initially, SparkScala, SparkR and pyspark tried to be in sync but I think
now many features are been added which is not causing sync between sparkR
and pyspark and similar between SparkScala and SpakR and  PySpark.

So I was thinking is it absolutely must have to sync between api? Since all
these will cater to different user.

These are ideas.




Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

 http://researcher.watson.ibm.com/researcher/photos/3531.jpg

Niketan Pansare - IBM
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden,
where he works on advanced information management systems that include
analytics, distributed ...


cid:1__=8FBB0B30DFE367208f9e8a93df938690918c8FB@ alok  singh ---09/22/2017
12:30:51 PM---Here are Niketan's question Thanks for taking time to answer
our questions and also for considering

From: alok singh <si...@hotmail.com>
To: "dev@systemml.apache.org" <de...@systemml.apache.org>, "deron@apache.org"
<de...@apache.org>
Date: 09/22/2017 12:30 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML





Here are Niketan's question

Thanks for taking time to answer our questions and also for considering to
help SystemML community. I have couple more questions:

Niketan:1.
In case there is inconsistency, do you (as R4ML developers) feel
comfortable changing R4ML interface to be compatible with our other APIs ?
May be you can go over the below two links and imagine adding a
corresponding R tab:
- MLContext Programming guide:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e= 


apache.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e= >

apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start
Spark Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST
Example; DataFrame ...



- Algorithm wrappers:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e= 


ALOK: Hi Niketan

As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product
feature.

Also note that the pure ML Ctx and the cmd options for dml is not ideally
allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have
created those wrapper but those are in R and from user point for view it
feels that are just writing the R code

see some of the examples at

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_tree_master_R4ML_inst_examples&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=r4-fcsboHpxlbVf6KyY7C6ptdLcjmyT2g1hBHuqRa2s&e= 

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_inst_examples_r4ml.demo.mlogit.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=ScIkMGbMLlKu7VgjnDI5pDia2L8C3W_9fZXwZBjb7BI&e= 


NOTE: that R4ML uses combination of SparkR and DML and R to make user
experience best.

If the ultimate goal is to have just MLCtx based R interface than I think
it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic
Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit
has better apis

2. Classification - GitHub
Pages<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e= >

apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic
Regression Description. The MultiLogReg.dml script performs both binomial
and multinomial ...




Niketan: 2. Other than providing R interface to SystemML as the above APIs,
what additional features/code R4ML plans to add in SystemML ? Just like we
want the R API to be functionally complete with our Python and Scala API,
we want Python and Scala APIs to  be functionally complete with the R API.
So a discussion on supporting the additional features in Python and Scala
APIs is required :)

ALOK: as talked in point 1) I think it will require a lot of work for scala
and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to
do the coding at R4ML.

but I think goals was to merge this project.

I think @Fred if he can comment also that would be nice

Thanks
Alok



From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; deron@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Hi

 We (me and Brendan) has been focusing on other things  like journeys apart
from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=5kDETV7oPDlZ3OUDHX3lkMp6VxEJB9dUWCX7bZ1c76o&e= ,
  it looks


https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_13631156-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=YbUfZ7ntWQKbF6sqdbPrpVyZpRnB5ZwvnabMDRSyrw0&e= 


SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_pull_50&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=fw5g1aTmnaaxg3-r142R9vfQbpvKlAPZPYbqHMe5Y-4&e= ).



https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_12959246-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=Z7RXGGwxwpayjbVxUMlwBw1v-s03TDqZDeIlo496ITo&e= 


[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 ·
SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this
project, I certify that: (a) The contribution was created in whole or in
part by me and I have the right to subm...


Deron








Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
see comments Alok:



From: Niketan Pansare <np...@us.ibm.com>
Sent: Friday, September 22, 2017 2:11 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
  
>>> As pointed out earlier, R4ML is not just R interface so it is based on the earlier product of IBM on R and it has many product feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have created those wrapper but those are in R and from user point for view it feels that are just writing the R code
If the ultimate goal is to have just MLCtx based R interface than I think it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis

May be we are not on same page. 
(a) MLContext is not the only API, but an important one that needs to be supported. 
(b) Like R4ML, our mllearn wrappers aim to simplify the usage for the Python users. These wrappers were designed so that if someone wrote a python script that uses scikit-learn or mllib. Then, a simple change from `from sklearn import LogisticRegression`  to `from systemml.mllearn import LogisticRegression` should in principle allow SystemML to be incorporated in their workflow. 

Alok: 
a) Does it mean you are proposing spliting R4ML into two R-wrapper and R4ML? I think that could be idea one can potentially look into it. I second it. That way one can have pure R wrapper and like mllearn kind of R4ML 

b) Currently we can sure expose the MLContext from R as public api but to use all the code involves many convulations to make life easier for R user. For example see code func *execute* *output* *getDF* in https://github.com/aloknsingh/r4ml/blob/0d79b3c7975be55989466869fe99ccfd47dd6dc3/R4ML/R/sysml.bridge.R

>> 1) I think it will require a lot of work for scala and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do the coding at R4ML. but I think goals was to merge this project.

I guess the goal is to make SystemML better and more user-friendly. To do that, we have to try our best to keep our APIs across language consistent. I understand it might require lot of work for Scala and Python APIs to be in sync with R4ML API,  but it has to be done. 


Since R4ML was designed in isolation with the SystemML project, I am recommending to do a gradual merge of (1) the additional features and (2) features that diverge from SystemML APIs so as to be R friendly; thus, allowing the SystemML community  to comment on them before merging. This also allows the R4ML features that match one-to-one with the Python and Scala APIs to be merged quickly and not be in the PR until we agree to every (1) and (2) features :)

Alok: See the previous comments I like we should explore the idea of splitting the way you splitted mllearn. Still more discussion needed as I see it.  At this stage those changes will require complete change at R4ML to have those.

Another way to think would be that R4ML can be independent package, which eventually be pushed to CRAN.
note that in the spark dev repo. Spark core is there and SparkR is there as seperate dir and python is there as seperate dir

Initially, SparkScala, SparkR and pyspark tried to be in sync but I think now many features are been added which is not causing sync between sparkR and pyspark and similar between SparkScala and SpakR and  PySpark.

So I was thinking is it absolutely must have to sync between api? Since all these will cater to different user.

These are ideas.




Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

 http://researcher.watson.ibm.com/researcher/photos/3531.jpg 

Niketan Pansare - IBM
researcher.watson.ibm.com
Niketan Pansare is a Senior Software Engineer at IBM Research Almaden, where he works on advanced information management systems that include analytics, distributed ...


cid:1__=8FBB0B30DFE367208f9e8a93df938690918c8FB@ alok  singh ---09/22/2017 12:30:51 PM---Here are Niketan's question Thanks for taking time to answer our questions and also for considering

From: alok singh <si...@hotmail.com>
To: "dev@systemml.apache.org" <de...@systemml.apache.org>, "deron@apache.org" <de...@apache.org>
Date: 09/22/2017 12:30 PM
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
 




Here are Niketan's question

Thanks for taking time to answer our questions and also for considering to help SystemML community. I have couple more questions:

Niketan:1.
In case there is inconsistency, do you (as R4ML developers) feel comfortable changing R4ML interface to be compatible with our other APIs ? May be you can go over the below two links and imagine adding a corresponding R tab:
- MLContext Programming guide: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e= 

apache.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e= >
apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start Spark Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST Example; DataFrame ...



- Algorithm wrappers: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e= 

ALOK: Hi Niketan

As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product feature.

Also note that the pure ML Ctx and the cmd options for dml is not ideally allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have created those wrapper but those are in R and from user point for view it feels that are just writing the R code

see some of the examples at 

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_tree_master_R4ML_inst_examples&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=r4-fcsboHpxlbVf6KyY7C6ptdLcjmyT2g1hBHuqRa2s&e= 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_inst_examples_r4ml.demo.mlogit.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=ScIkMGbMLlKu7VgjnDI5pDia2L8C3W_9fZXwZBjb7BI&e= 

NOTE: that R4ML uses combination of SparkR and DML and R to make user experience best.

If the ultimate goal is to have just MLCtx based R interface than I think it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis

2. Classification - GitHub Pages<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e= >
apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic Regression Description. The MultiLogReg.dml script performs both binomial and multinomial ...




Niketan: 2. Other than providing R interface to SystemML as the above APIs, what additional features/code R4ML plans to add in SystemML ? Just like we want the R API to be functionally complete with our Python and Scala API, we want Python and Scala APIs to  be functionally complete with the R API. So a discussion on supporting the additional features in Python and Scala APIs is required :)

ALOK: as talked in point 1) I think it will require a lot of work for scala and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do the coding at R4ML.

but I think goals was to merge this project.

I think @Fred if he can comment also that would be nice

Thanks
Alok



From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; deron@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
    
Hi 

 We (me and Brendan) has been focusing on other things  like journeys apart from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
    
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=5kDETV7oPDlZ3OUDHX3lkMp6VxEJB9dUWCX7bZ1c76o&e= ,  it looks

 https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_13631156-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=YbUfZ7ntWQKbF6sqdbPrpVyZpRnB5ZwvnabMDRSyrw0&e= 

SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_pull_50&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=fw5g1aTmnaaxg3-r142R9vfQbpvKlAPZPYbqHMe5Y-4&e= ).

 https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_12959246-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=Z7RXGGwxwpayjbVxUMlwBw1v-s03TDqZDeIlo496ITo&e= 

[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 · SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to subm...


Deron
        



   

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Niketan Pansare <np...@us.ibm.com>.
>>> As pointed out earlier, R4ML is not just R interface so it is based on
the earlier product of IBM on R and it has many product feature.
Also note that the pure ML Ctx and the cmd options for dml is not ideally
allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have
created those wrapper but those are in R and from user point for view it
feels that are just writing the R code
 If the ultimate goal is to have just MLCtx based R interface than I think
it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic
Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit
has better apis

May be we are not on same page.
(a) MLContext is not the only API, but an important one that needs to be
supported.
(b) Like R4ML, our mllearn wrappers aim to simplify the usage for the
Python users. These wrappers were designed so that if someone wrote a
python script that uses scikit-learn or mllib. Then, a simple change from
`from sklearn import LogisticRegression` to `from systemml.mllearn import
LogisticRegression` should in principle allow SystemML to be incorporated
in their workflow.

>> 1) I think it will require a lot of work for scala and python api to be
in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to
do the coding at R4ML. but I think goals was to merge this project.

I guess the goal is to make SystemML better and more user-friendly. To do
that, we have to try our best to keep our APIs across language consistent.
I understand it might require lot of work for Scala and Python APIs to be
in sync with R4ML API, but it has to be done.

Since R4ML was designed in isolation with the SystemML project, I am
recommending to do a gradual merge of (1) the additional features and (2)
features that diverge from SystemML APIs so as to be R friendly; thus,
allowing the SystemML community to comment on them before merging. This
also allows the R4ML features that match one-to-one with the Python and
Scala APIs to be merged quickly and not be in the PR until we agree to
every (1) and (2) features :)

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	alok singh <si...@hotmail.com>
To:	"dev@systemml.apache.org" <de...@systemml.apache.org>,
            "deron@apache.org" <de...@apache.org>
Date:	09/22/2017 12:30 PM
Subject:	Re: [PROPOSAL] R4ML Integration with SystemML





Here are Niketan's question

Thanks for taking time to answer our questions and also for considering to
help SystemML community. I have couple more questions:

Niketan:1.
 In case there is inconsistency, do you (as R4ML developers) feel
comfortable changing R4ML interface to be compatible with our other APIs ?
May be you can go over the below two links and imagine adding a
corresponding R tab:
- MLContext Programming guide:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e=


apache.github.io<
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_spark-2Dmlcontext-2Dprogramming-2Dguide&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=xyErlMsfwKjn_qfkXHpjLG8E1B70N5zVX-OWl5LU-yU&e=
 >
apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start
Spark Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST
Example; DataFrame ...



- Algorithm wrappers:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e=


ALOK: Hi Niketan

 As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product
feature.

Also note that the pure ML Ctx and the cmd options for dml is not ideally
allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have
created those wrapper but those are in R and from user point for view it
feels that are just writing the R code

see some of the examples at

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_tree_master_R4ML_inst_examples&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=r4-fcsboHpxlbVf6KyY7C6ptdLcjmyT2g1hBHuqRa2s&e=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_inst_examples_r4ml.demo.mlogit.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=ScIkMGbMLlKu7VgjnDI5pDia2L8C3W_9fZXwZBjb7BI&e=


NOTE: that R4ML uses combination of SparkR and DML and R to make user
experience best.

If the ultimate goal is to have just MLCtx based R interface than I think
it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic
Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit
has better apis

2. Classification - GitHub Pages<
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.github.io_systemml_algorithms-2Dclassification.html-23multinomial-2Dlogistic-2Dregression&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=TpQy-5v3cbhFJfGbEodsNvhrU8gDWexYBwN9x2eXzlc&e=
 >
apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic
Regression Description. The MultiLogReg.dml script performs both binomial
and multinomial ...




Niketan: 2. Other than providing R interface to SystemML as the above APIs,
what additional features/code R4ML plans to add in SystemML ? Just like we
want the R API to be functionally complete with our Python and Scala API,
we want Python and Scala APIs to be functionally complete with the R API.
So a discussion on supporting the additional features in Python and Scala
APIs is required :)

ALOK: as talked in point 1) I think it will require a lot of work for scala
and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to
do the coding at R4ML.

but I think goals was to merge this project.

I think @Fred if he can comment also that would be nice

Thanks
Alok



From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; deron@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

Hi

 We (me and Brendan) has been focusing on other things  like journeys apart
from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML

>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_blob_master_R4ML_R_sysml.bridge.R&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=5kDETV7oPDlZ3OUDHX3lkMp6VxEJB9dUWCX7bZ1c76o&e=
 , it looks


https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_13631156-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=YbUfZ7ntWQKbF6sqdbPrpVyZpRnB5ZwvnabMDRSyrw0&e=


SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SparkTC_r4ml_pull_50&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=fw5g1aTmnaaxg3-r142R9vfQbpvKlAPZPYbqHMe5Y-4&e=
 ).


https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_12959246-3Fv-3D4-26s-3D400&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=d7aHl15rr92bxoHo26sphduc7Q_4C0GizrRv_AR5pEM&s=Z7RXGGwxwpayjbVxUMlwBw1v-s03TDqZDeIlo496ITo&e=


[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 ·
SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this
project, I certify that: (a) The contribution was created in whole or in
part by me and I have the right to subm...


Deron




Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.

Here are Niketan's question

Thanks for taking time to answer our questions and also for considering to help SystemML community. I have couple more questions:

Niketan:1.
 In case there is inconsistency, do you (as R4ML developers) feel comfortable changing R4ML interface to be compatible with our other APIs ? May be you can go over the below two links and imagine adding a corresponding R tab:
- MLContext Programming guide: http://apache.github.io/systemml/spark-mlcontext-programming-guide

apache.github.io<http://apache.github.io/systemml/spark-mlcontext-programming-guide>
apache.github.io
Spark MLContext Programming Guide. Overview; Spark Shell Example. Start Spark Shell with SystemML; Create MLContext; Hello World; LeNet on MNIST Example; DataFrame ...



- Algorithm wrappers: http://apache.github.io/systemml/algorithms-classification.html#multinomial-logistic-regression

ALOK: Hi Niketan

 As pointed out earlier, R4ML is not just R interface
so it is based on the earlier product of IBM on R and it has many product feature.

Also note that the pure ML Ctx and the cmd options for dml is not ideally allow all the things user want to do in his ML code.
The solution could be to create wrapper to make user happy  . but we have created those wrapper but those are in R and from user point for view it feels that are just writing the R code

see some of the examples at 

https://github.com/SparkTC/r4ml/tree/master/R4ML/inst/examples
https://github.com/SparkTC/r4ml/blob/master/R4ML/inst/examples/r4ml.demo.mlogit.R

NOTE: that R4ML uses combination of SparkR and DML and R to make user experience best. 

If the ultimate goal is to have just MLCtx based R interface than I think it undermines and R4ML value proposition.
(We can definitely just expose MLCtx api. However calling Logistic Regression example just for the purpose of MLCtx won't be best) R4ML.mlogit has better apis

2. Classification - GitHub Pages<http://apache.github.io/systemml/algorithms-classification.html#multinomial-logistic-regression>
apache.github.io
SystemML Algorithms Reference 2. Classification 2.1. Multinomial Logistic Regression Description. The MultiLogReg.dml script performs both binomial and multinomial ...




Niketan: 2. Other than providing R interface to SystemML as the above APIs, what additional features/code R4ML plans to add in SystemML ? Just like we want the R API to be functionally complete with our Python and Scala API, we want Python and Scala APIs to be functionally complete with the R API. So a discussion on supporting the additional features in Python and Scala APIs is required :)

ALOK: as talked in point 1) I think it will require a lot of work for scala and python api to be in sync with r4ml api.
Also I feel that if the goal is too have just python, scala than we have to do the coding at R4ML.

but I think goals was to merge this project.

I think @Fred if he can comment also that would be nice

Thanks
Alok



From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 7:32 PM
To: dev@systemml.apache.org; deron@apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
    
Hi 

 We (me and Brendan) has been focusing on other things  like journeys apart from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
    
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://github.com/SparkTC/r4ml/blob/master/R4ML/R/sysml.bridge.R, it looks

 https://avatars2.githubusercontent.com/u/13631156?v=4&s=400

SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://github.com/SparkTC/r4ml/pull/50).

 https://avatars2.githubusercontent.com/u/12959246?v=4&s=400

[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 · SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to subm...


Deron
        

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
Hi 

 We (me and Brendan) has been focusing on other things  like journeys apart from new MLCtx changes. R4ML commits and PR you can also review,
I think code will definitely be maintained.

Alok





From: Deron Eriksson <de...@gmail.com>
Sent: Thursday, September 21, 2017 6:03 PM
To: dev@systemml.apache.org
Subject: Re: [PROPOSAL] R4ML Integration with SystemML
    
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://github.com/SparkTC/r4ml/blob/master/R4ML/R/sysml.bridge.R, it looks

 https://avatars2.githubusercontent.com/u/13631156?v=4&s=400 

SparkTC/r4ml
github.com
r4ml - Scalable R for Machine Learning

like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://github.com/SparkTC/r4ml/pull/50).

 https://avatars2.githubusercontent.com/u/12959246?v=4&s=400 

[WIP][I-50][R4ML-123] new MLContext API by aloknsingh · Pull Request #50 · SparkTC/r4ml
github.com
Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to subm...


Deron
    

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by Deron Eriksson <de...@gmail.com>.
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://github.com/SparkTC/r4ml/blob/master/R4ML/R/sysml.bridge.R, it looks
like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://github.com/SparkTC/r4ml/pull/50).

Deron

Re: [PROPOSAL] R4ML Integration with SystemML

Posted by alok singh <si...@hotmail.com>.
Answers:


* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

ALOK: We will doing development into it . there are open PR already.

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

ALOK: Currently no out of box MLCtx.

There is another proposal of exposing linAlg on R using Matrix class but that is on hold (I had discuss with Matthias)


> 1) Would they like to merge R4ML code into the main SystemML project
ALOK: In R we have to follow a pattern dir structure. we might be able to create more R pacakges. There will be a sub dir in systemML called R or something
in that subdir there will be subdir R4ML (one R pacakge) in future more R pacakge as subdir (more details later)

> itself? (Currently we have no modules.)
> 2) What would they like to merge?
ALOK see 1)
> 3) If so, how do they propose to do so?
ALOK: will explain in future proposal email

> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
ALOK: I will do majority of work
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
ALOK: Alok and Brendan will maintain.
> 6) Documentation is needed (fit in SystemML documentation framework).
ALOK: as Brendan pointed out R docs are different and we will take care of it . it is self contained

> 7) Testing is needed (fit into SystemML testing framework).
ALOK: testing will usually by the maven system command exec where it just calls
cd <somesubdir> ; ./bin/install_all
> 8) How is this packaged?
ALOK:subdir
________________________________
From: alok singh <si...@hotmail.com>
Sent: Thursday, September 21, 2017 4:38 PM
To: dev@systemml.apache.org
Subject: [PROPOSAL] R4ML Integration with SystemML

Hi All,

I just  This in ref to the thread using new id so can't reply to thread "[DISCUSS] R-Interface to SystemML"

Thanks Deron and Matthias and Niketon for the feedback.


I will create the official proposal next week and send the details.

I will have some emails now.

Here I am just copy pasting the main points by individuals in the stack order

Matthis
----------
* Looking over the github repo, apparently R4ML is not under active
development/maintenance anymore (last commit Jul 20). So who would be
willing to maintain and extend it?

* Providing wrappers for our algorithm scripts would be just a start
because it hides our core value proposition of custom large-scale ML.
Hence, we would also need an MLContext equivalent that allows to execute
arbitrary DML scripts or R functions. Is there already a tentative design
of such an API and if not, who would like to take it over?

Deron
--------
Perhaps R4ML committers could supply a little more info? For instance:
> 1) Would they like to merge R4ML code into the main SystemML project
> itself? (Currently we have no modules.)
> 2) What would they like to merge?
> 3) If so, how do they propose to do so?
> 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> who would like to volunteer to do this?
> 5) Who will maintain the contributed code? Or who would like to volunteer
> to do this?
> 6) Documentation is needed (fit in SystemML documentation framework).
> 7) Testing is needed (fit into SystemML testing framework).
> 8) How is this packaged?
>

Niketan
----------

Also, comparing the features of R4ML with that of our Python APIs will be
useful as it might make a stronger case for R4ML.

Alok