You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2016/08/03 04:45:20 UTC

[jira] [Comment Edited] (SPARK-7146) Should ML sharedParams be a public API?

    [ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405300#comment-15405300 ] 

Nicholas Chammas edited comment on SPARK-7146 at 8/3/16 4:45 AM:
-----------------------------------------------------------------

A quick update from a PySpark user: I am using HasInputCol, HasInputCols, HasLabelCol, and HasOutputCol to create custom transformers, and I find them very handy.

I know Python does not have a notion of "private" classes, but knowing these are part of the public API would be good.

In summary: The updated proposal looks good to me, with the caveat that I only just started learning the new ML Pipeline API.


was (Author: nchammas):
A quick update from a PySpark user: I am using HasInputCol, HasInputCols, HasLabelCol, and HasOutputCol to create custom transformers, and I find them very handy.

I know Python does not have a notion of "private" classes, but knowing these are part of the public API would be good.

I summary: The updated proposal looks good to me, with the caveat that I only just started learning the new ML Pipeline API.

> Should ML sharedParams be a public API?
> ---------------------------------------
>
>                 Key: SPARK-7146
>                 URL: https://issues.apache.org/jira/browse/SPARK-7146
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> Proposal: Make most of the Param traits in sharedParams.scala public.  Mark them as DeveloperApi.
> Pros:
> * Sharing the Param traits helps to encourage standardized Param names and documentation.
> Cons:
> * Users have to be careful since parameters can have different meanings for different algorithms.
> * If the shared Params are public, then implementations could test for the traits.  It is unclear if we want users to rely on these traits, which are somewhat experimental.
> Currently, the shared params are private.
> h3. UPDATED proposal
> * Some Params are clearly safe to make public.  We will do so.
> * Some Params could be made public but may require caveats in the trait doc.
> * Some Params have turned out not to be shared in practice.  We can move those Params to the classes which use them.
> *Public shared params*:
> * I/O column params
> ** HasFeaturesCol
> ** HasInputCol
> ** HasInputCols
> ** HasLabelCol
> ** HasOutputCol
> ** HasPredictionCol
> ** HasProbabilityCol
> ** HasRawPredictionCol
> ** HasVarianceCol
> ** HasWeightCol
> * Algorithm settings
> ** HasCheckpointInterval
> ** HasElasticNetParam
> ** HasFitIntercept
> ** HasMaxIter
> ** HasRegParam
> ** HasSeed
> ** HasStandardization (less common)
> ** HasStepSize
> ** HasTol
> *Questionable params*:
> * HasHandleInvalid (only used in StringIndexer, but might be more widely used later on)
> * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA)
> *Params to be removed from sharedParams*:
> * HasThreshold (only used in LogisticRegression)
> * HasThresholds (only used in ProbabilisticClassifier)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org