You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "holdenk (JIRA)" <ji...@apache.org> on 2016/11/01 15:19:58 UTC

[jira] [Commented] (SPARK-7146) Should ML sharedParams be a public API?

    [ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625685#comment-15625685 ] 

holdenk commented on SPARK-7146:
--------------------------------

I think it might be reasonable to just expose it as Scala traits and mark it as a developer API since we don't really expect this to be used directly by end users but more by library developers. What do y'all think?

> Should ML sharedParams be a public API?
> ---------------------------------------
>
>                 Key: SPARK-7146
>                 URL: https://issues.apache.org/jira/browse/SPARK-7146
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> Proposal: Make most of the Param traits in sharedParams.scala public.  Mark them as DeveloperApi.
> Pros:
> * Sharing the Param traits helps to encourage standardized Param names and documentation.
> Cons:
> * Users have to be careful since parameters can have different meanings for different algorithms.
> * If the shared Params are public, then implementations could test for the traits.  It is unclear if we want users to rely on these traits, which are somewhat experimental.
> Currently, the shared params are private.
> h3. UPDATED proposal
> * Some Params are clearly safe to make public.  We will do so.
> * Some Params could be made public but may require caveats in the trait doc.
> * Some Params have turned out not to be shared in practice.  We can move those Params to the classes which use them.
> *Public shared params*:
> * I/O column params
> ** HasFeaturesCol
> ** HasInputCol
> ** HasInputCols
> ** HasLabelCol
> ** HasOutputCol
> ** HasPredictionCol
> ** HasProbabilityCol
> ** HasRawPredictionCol
> ** HasVarianceCol
> ** HasWeightCol
> * Algorithm settings
> ** HasCheckpointInterval
> ** HasElasticNetParam
> ** HasFitIntercept
> ** HasMaxIter
> ** HasRegParam
> ** HasSeed
> ** HasStandardization (less common)
> ** HasStepSize
> ** HasTol
> *Questionable params*:
> * HasHandleInvalid (only used in StringIndexer, but might be more widely used later on)
> * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA)
> *Params to be removed from sharedParams*:
> * HasThreshold (only used in LogisticRegression)
> * HasThresholds (only used in ProbabilisticClassifier)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org