You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:54:06 UTC

[jira] [Updated] (SPARK-28958) pyspark.ml function parity

     [ https://issues.apache.org/jira/browse/SPARK-28958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-28958:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> pyspark.ml function parity
> --------------------------
>
>                 Key: SPARK-28958
>                 URL: https://issues.apache.org/jira/browse/SPARK-28958
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 3.1.0
>            Reporter: zhengruifeng
>            Priority: Major
>         Attachments: ML_SYNC.pdf
>
>
> I looked into the hierarchy of both py and scala sides, and found that they are quite different, which damage the parity and make the codebase hard to maintain.
> The main inconvenience is that most models in pyspark do not support any param getters and setters.
> In the py side, I think we need to do:
> 1, remove setters generated by _shared_params_code_gen.py;
> 2, add common abstract classes like the side side, such as JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier;
> 3, for each alg, add its param trait, such as LinearSVCParams;
> 4, since sharedParam do not have setters, we need to add them in right places;
> Unfortunately, I notice that if we do 1 (remove setters generated by _shared_params_code_gen.py), all algs (classification/regression/clustering/features/fpm/recommendation) need to be modified in one batch.
> The scala side also need some small improvements, but I think they can be leave alone at first



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org