You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2018/06/22 18:48:00 UTC

[jira] [Updated] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

     [ https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joseph K. Bradley updated SPARK-24632:
--------------------------------------
    Description: 
This is a follow-up for [SPARK-17025], which allowed users to implement Python PipelineStages in 3rd-party libraries, include them in Pipelines, and use Pipeline persistence.  This task is to make it easier for 3rd-party libraries to have PipelineStages written in Java and then to use pyspark.ml abstractions to create wrappers around those Java classes.  This is currently possible, except that users hit bugs around persistence.

Some fixes we'll need include:
* an overridable method for converting between Python and Java classpaths. See https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284
* https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378

One unusual thing for this task will be to write unit tests which test a custom PipelineStage written outside of the pyspark package.

  was:
This is a follow-up for [SPARK-17025], which allowed users to implement Python PipelineStages in 3rd-party libraries, include them in Pipelines, and use Pipeline persistence.  This task is to make it easier for 3rd-party libraries to have PipelineStages written in Java and then to use pyspark.ml abstractions to create wrappers around those Java classes.  This is currently possible, except that users hit bugs around persistence.

One fix we'll need is an overridable method for converting between Python and Java classpaths. See https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284

One unusual thing for this task will be to write unit tests which test a custom PipelineStage written outside of the pyspark package.


> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24632
>                 URL: https://issues.apache.org/jira/browse/SPARK-24632
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 2.4.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement Python PipelineStages in 3rd-party libraries, include them in Pipelines, and use Pipeline persistence.  This task is to make it easier for 3rd-party libraries to have PipelineStages written in Java and then to use pyspark.ml abstractions to create wrappers around those Java classes.  This is currently possible, except that users hit bugs around persistence.
> Some fixes we'll need include:
> * an overridable method for converting between Python and Java classpaths. See https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284
> * https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378
> One unusual thing for this task will be to write unit tests which test a custom PipelineStage written outside of the pyspark package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org