You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2016/03/19 00:03:42 UTC

[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/11835

    [SPARK-13951] Add nested Pipeline load/save supports in PySpark

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-13951
    
    1. Scala side `Pipeline` changes:
      
      * Change `Param[Array[PipelineStage]]` to `StageArrayParam` to support Java competible function.
    
    2. Python side changes:
    
      * wrapper: Add a `JavaConvertible` to support those stages that are not a `JavaWrapper`.
    
      * wrapper: Add a `ConvertUtil` to support the Python-Scala converting for both `JavaWrapper` and `JavaConvertible`.
    
      * pipeline: `Pipeline` and `PipelineModel` now extend from `JavaConvertible`.
    
      * pipeline: `PipelineMLReader`, `PipelineMLWriter`, `PipelineModelMLReader`, and `PipelineModelMLWriter` now use `ConvertUtils`.
    
    
    ## How was this patch tested?
    
    Test with Python unit test for both pipeline save/load and nested-pipeline save/load.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-13951

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11835
    
----
commit 459c073610608738ed9102f57a88e31a38c87db3
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-17T06:57:23Z

    relax read side

commit 6f035ca2f4c67ad69ee702aaf7894dda60b38add
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-17T06:58:27Z

    relax write side

commit d13bf3a499c83f8841982f1ec83ed2273d204caf
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-18T00:02:38Z

    add test

commit 9e9bf2f1f161f6863e2ca80e524f3dc84b25ccf4
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-18T07:19:25Z

    version 1, fix nested pipeline load/save

commit ce202a2db73886414e79f16e17af8231fa9751a2
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-18T22:00:00Z

    another step to add JavaConvertible

commit d0ae8e2aaca91019f98180933b8e236ac2e891c7
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-18T22:39:53Z

    fix all

commit 4a009902025d0ce954e049fd73fad73547757fff
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-18T22:56:16Z

    merge with master

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-199480657
  
    @yinxusen Thanks very much for working on this issue.  After looking at it, I think we should use a different approach entirely in Python.  I also found some issues with the current Python ML persistence setup, which I'd like to clean up.  I'd like to take over this issue.  Could you please close this PR and, if you have time, help review my PR for it: [https://github.com/apache/spark/pull/11866]?
    
    Thanks a lot for your understanding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198579714
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198634432
  
    @jkbradley 
    
    MiMa tests failed for changing to the `StageArrayParam`. But I think we need a new Param like `ArrayParam[T]` with the Java compatible `w` function. Otherwise, it's hard to build an array-param in Python side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198579682
  
    **[Test build #53580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53580/consoleFull)** for PR 11835 at commit [`4a00990`](https://github.com/apache/spark/commit/4a009902025d0ce954e049fd73fad73547757fff).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-199408457
  
    I think we should be able to implement this using Python only, and definitely without breaking the Scala/Java API.  Let me prototype a little, and get back to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198572852
  
    test it please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-199364659
  
    I'll take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11835


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198579716
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53580/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13951] Add nested Pipeline load/save su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11835#issuecomment-198575308
  
    **[Test build #53580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53580/consoleFull)** for PR 11835 at commit [`4a00990`](https://github.com/apache/spark/commit/4a009902025d0ce954e049fd73fad73547757fff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org