You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@systemml.apache.org by Janardhan Pulivarthi <ja...@gmail.com> on 2017/08/23 16:30:20 UTC

Gentle ping for help on all my PRs. Thanks.

Dear committers,

I am feeling that my contributions are not in an ordered way. So, I am
listing them here. And, also listed the help required from volunteers.

1. [SYSTEMML-1437] - factorization machines
<https://github.com/apache/systemml/pull/526> [*in progress*]
     - till now, I have implemented the `*fm.dml*` core module.
     - *Help: *I am unclear as to how the `i/o` will be for the example
implementations, such as regression. A sample script for this, might help
me complete all the examples *regression*, *classification*, & *ranking*.


2. [SYSTEMML-1645] - Verify whether all scripts work with MLContext &
automate <https://github.com/apache/systemml/pull/589> [*in progress*]
    - This PR tries to write the test scripts for all top level algorithms
for the new MLContext.
    - I am working with *Jerome* on this. Once he verifies all the scripts,
I will add the tests for them.
    - *Help: *Can any body help review this PR, and suggest what is missing
in this PR. I am getting script execution failures.

3. [SYSTEMML-1444] - UDFs w/ single output in expressions
<https://github.com/apache/systemml/pull/603> [*in progress*]
   - The objective is to make udf's callable from expressions. I've gone
through all the Hop, Lop implementations, compiler, parser, api to have a
clear picture.
   - I am still making my way through this.
   - *Help: *Hi Matthias, I tried implementing a lop *FunctionCallCPSingle.java
<https://github.com/apache/systemml/pull/603/files#diff-1fb71e441518b2859963b386b1869711>
 *

4. [SYSTEMML-1216] - implement local svd( ) function
<https://github.com/apache/systemml/pull/605> [*done*]
   - With previously implemented local svd(), I've added little
improvements and tests.
   - *Help: *This is ready to be merged.  (I believe)

5. [SYSTEMML-1214] Implement scalable version of singular value
decomposition <https://github.com/apache/systemml/pull/612> [*issue with
the testing?*]
    - This PR depends on the above PR, this implements distributed svd
based on already implemented distributed qr() and then calculates local
svd() of then obtained R matrix.
    - *Help: *I have implemented it preliminarily, but how should I test
for scalability, Can I do that on *Jenkins CI*. or Do I need to run that on
any cluster.

6. [SYSTEMML-979] Add support for Bayesian Optimization.
<https://github.com/apache/systemml/pull/632> [*a lot to be done*]
    - There's a lot of work in progress, but I implemented a skeleton with
bad syntax. ( I'll improve this soon)
    - many improvements have to be done, better operation needs to be kept
and loops needs to be completely eliminated.

Thanks all for the support,
Janardhan

Re: Gentle ping for help on all my PRs. Thanks.

Posted by Deron Eriksson <de...@gmail.com>.

Hi Janardhan,

Thank you for working on #2 [SYSTEMML-1645].

It looks like there are around 30 algorithms in the scripts directory. To
verify that each of these algorithms works with MLContext in a reproducible
form through the creation of MLContext tests for each algorithm is an
enormous undertaking for one person and is very difficult to handle in a
single pull request.

I would recommend:
Rather than working on all algorithms at once, instead focus on algorithms
one-at-a-time.
If a JIRA to create an MLContext test for this algorithm does not exist
under [SYSTEMML-1645], create it.
Start by selecting shorter algorithm scripts (perhaps under 300 lines) such
as linear regression. GLM is very difficult (~1200 lines).
Create a test class with test cases for this algorithm (this can be similar
to MLContextUnivariateStatisticsTest).
This test class must compile locally.
Verify that the test cases run locally. You should be able to do this
locally in your IDE or also using maven.
Once the test cases work locally, commit your changes and create a pull
request for this single test class.
(Optionally, you could also label the pull request with [WIP] and ask for
feedback if you are stuck. The mailing list is probably an even better
place to ask for feedback.)
Assuming the test suite passes, members of the community may give some
additional feedback which can be incorporated into the pull request.
At this stage, the pull request for the single algorithm can be merged.

This can be repeated for each algorithm (or group of closely related
algorithms). This may mean the creation of 20 pull requests rather than 1
enormous pull request.

Additionally, note that we have certain JIRAs such as SYSTEMML-1646. This
JIRA says that LinearRegCG.dml and LinearRegDS.dml work with MLContext.
However, no MLContext test class exists for this JIRA, so this result can't
be automatically reproduced anywhere. It would be very beneficial to have a
MLContextLinRegTest (such as you created on PR 589). So you might want to
have SYSTEMML-1646 either reopened and assigned to you, or you could create
another JIRA issue for your MLContextLinRegTest class. Then, create a
branch for this MLContextLinRegTest and make your commits and do a pull
request for this single test class. Then once this pull request has been
accepted and merged, then select another algorithm and create a
corresponding test class.

So, my advice would be to close PR 589. You can use the identical work you
did there as the basis for other pull requests, but divide the work into 1
JIRA issue/1 branch/1 pull request for each algorithm (or closely related
algorithms like LinearRegCG and LinearRegDS). I recommend focusing on only
one algorithm at a time. I think MLContextLinRegTest is probably a good
place to start.

Thanks for all the hard work!

Deron

On Wed, Aug 23, 2017 at 9:30 AM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:

> Dear committers,
>
> I am feeling that my contributions are not in an ordered way. So, I am
> listing them here. And, also listed the help required from volunteers.
>
> 1. [SYSTEMML-1437] - factorization machines
> <https://github.com/apache/systemml/pull/526> [*in progress*]
>      - till now, I have implemented the `*fm.dml*` core module.
>      - *Help: *I am unclear as to how the `i/o` will be for the example
> implementations, such as regression. A sample script for this, might help
> me complete all the examples *regression*, *classification*, & *ranking*.
>
>
> 2. [SYSTEMML-1645] - Verify whether all scripts work with MLContext &
> automate <https://github.com/apache/systemml/pull/589> [*in progress*]
>     - This PR tries to write the test scripts for all top level algorithms
> for the new MLContext.
>     - I am working with *Jerome* on this. Once he verifies all the scripts,
> I will add the tests for them.
>     - *Help: *Can any body help review this PR, and suggest what is missing
> in this PR. I am getting script execution failures.
>
> 3. [SYSTEMML-1444] - UDFs w/ single output in expressions
> <https://github.com/apache/systemml/pull/603> [*in progress*]
>    - The objective is to make udf's callable from expressions. I've gone
> through all the Hop, Lop implementations, compiler, parser, api to have a
> clear picture.
>    - I am still making my way through this.
>    - *Help: *Hi Matthias, I tried implementing a lop
> *FunctionCallCPSingle.java
> <https://github.com/apache/systemml/pull/603/files#diff-
> 1fb71e441518b2859963b386b1869711>
>  *
>
> 4. [SYSTEMML-1216] - implement local svd( ) function
> <https://github.com/apache/systemml/pull/605> [*done*]
>    - With previously implemented local svd(), I've added little
> improvements and tests.
>    - *Help: *This is ready to be merged.  (I believe)
>
> 5. [SYSTEMML-1214] Implement scalable version of singular value
> decomposition <https://github.com/apache/systemml/pull/612> [*issue with
> the testing?*]
>     - This PR depends on the above PR, this implements distributed svd
> based on already implemented distributed qr() and then calculates local
> svd() of then obtained R matrix.
>     - *Help: *I have implemented it preliminarily, but how should I test
> for scalability, Can I do that on *Jenkins CI*. or Do I need to run that on
> any cluster.
>
> 6. [SYSTEMML-979] Add support for Bayesian Optimization.
> <https://github.com/apache/systemml/pull/632> [*a lot to be done*]
>     - There's a lot of work in progress, but I implemented a skeleton with
> bad syntax. ( I'll improve this soon)
>     - many improvements have to be done, better operation needs to be kept
> and loops needs to be completely eliminated.
>
> Thanks all for the support,
> Janardhan
>