You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/12/27 19:25:58 UTC

[jira] [Commented] (SPARK-18757) Models in Pyspark support column setters

    [ https://issues.apache.org/jira/browse/SPARK-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781096#comment-15781096 ] 

Joseph K. Bradley commented on SPARK-18757:
-------------------------------------------

I think it's useful to make the Python types match the Scala ones.

However, I want to avoid introducing more abstract classes in Scala, unless they are really useful.  We've run into issues with them, where some (especially the Classifier and ProbabilisticClassifier) need to be specialized so much that they are probably not worth the trouble.  At some point, I hope we can replace them with traits and eliminate the "shared" implementations which are really not shareable across all subclasses.  To see what I mean, check out the overrides in LogisticRegression.

> Models in Pyspark support column setters
> ----------------------------------------
>
>                 Key: SPARK-18757
>                 URL: https://issues.apache.org/jira/browse/SPARK-18757
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML, PySpark
>            Reporter: zhengruifeng
>
> Recently, I found three places in which column setters are missing: KMeansModel, BisectingKMeansModel and OneVsRestModel.
> These three models directly inherit `Model` which dont have columns setters, so I had to add the missing setters manually in [SPARK-18625] and [SPARK-18520].
> Fow now, models in pyspark still don't support column setters at all.
> I suggest that we keep the hierarchy of pyspark models in line with that in the scala side:
> For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. In it, I try to copy the hierarchy from the scala side.
> For clustering algs, I think we may first create abstract classes {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, and make clustering algs inherit it. Then, in the python side, we copy the hierarchy so that we dont need to add setters manually for each alg.
> For features algs, we can also use a abstract class {{FeatureModel}} in scala side, and do the same thing.
> What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org