You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2015/12/02 20:51:11 UTC
[jira] [Commented] (SPARK-11219) Make Parameter Description Format Consistent in PySpark.MLlib

    [ https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036470#comment-15036470 ] 

Bryan Cutler commented on SPARK-11219:
--------------------------------------

I added an assessment of the current state of param descriptions for algorithms/models in pyspark.mllib.  To keep changes well separated, I will make sub-tasks for each Python file, except for FPM and Recommendation which are small and can probably be combined.

> Make Parameter Description Format Consistent in PySpark.MLlib
> -------------------------------------------------------------
>
>                 Key: SPARK-11219
>                 URL: https://issues.apache.org/jira/browse/SPARK-11219
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, MLlib, PySpark
>            Reporter: Bryan Cutler
>            Priority: Trivial
>
> There are several different formats for describing params in PySpark.MLlib, making it unclear what the preferred way to document is, i.e. vertical alignment vs single line.
> This is to agree on a format and make it consistent across PySpark.MLlib.
> Following the discussion in SPARK-10560, using 2 lines with an indentation is both readable and doesn't lead to changing many lines when adding/removing parameters.  If the parameter uses a default value, put this in parenthesis in a new line under the description.
> Example:
> {noformat}
> :param stepSize:
>   Step size for each iteration of gradient descent.
>   (default: 0.1)
> :param numIterations:
>   Number of iterations run for each batch of data.
>   (default: 50)
> {noformat}
> h2. Current State of Parameter Description Formating
> h4. Classification
>   * LogisticRegressionModel - single line descriptions, fix indentations
>   * LogisticRegressionWithSGD - vertical alignment, sporatic default values
>   * LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
>   * SVMModel - single line
>   * SVMWithSGD - vertical alignment, sporatic default values
>   * NaiveBayesModel - single line
>   * NaiveBayes - single line
> h4. Clustering
>   * KMeansModel - missing param description
>   * KMeans - missing param description and defaults
>   * GaussianMixture - vertical align, incorrect default formatting
>   * PowerIterationClustering - single line with wrapped indentation, missing defaults
>   * StreamingKMeansModel - single line wrapped
>   * StreamingKMeans - single line wrapped, missing defaults
>   * LDAModel - single line
>   * LDA - vertical align, mising some defaults
> h4. FPM  
>   * FPGrowth - single line
>   * PrefixSpan - single line, defaults values in backticks
> h4. Recommendation
>   * ALS - does not have param descriptions
> h4. Regression
>   * LabeledPoint - single line
>   * LinearModel - single line
>   * LinearRegressionWithSGD - vertical alignment
>   * RidgeRegressionWithSGD - vertical align
>   * IsotonicRegressionModel - single line
>   * IsotonicRegression - single line, missing default
> h4. Tree
>   * DecisionTree - single line with vertical indentation, missing defaults
>   * RandomForest - single line with wrapped indent, missing some defaults
>   * GradientBoostedTrees - single line with wrapped indent
> NOTE
> This issue will just focus on model/algorithm descriptions, which are the largest source of inconsistent formatting
> evaluation.py, feature.py, random.py, utils.py - these supporting classes have param descriptions as single line, but are consistent so don't need to be changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org