You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2015/12/02 20:51:11 UTC
[jira] [Commented] (SPARK-11219) Make Parameter Description Format
Consistent in PySpark.MLlib
[ https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036470#comment-15036470 ]
Bryan Cutler commented on SPARK-11219:
--------------------------------------
I added an assessment of the current state of param descriptions for algorithms/models in pyspark.mllib. To keep changes well separated, I will make sub-tasks for each Python file, except for FPM and Recommendation which are small and can probably be combined.
> Make Parameter Description Format Consistent in PySpark.MLlib
> -------------------------------------------------------------
>
> Key: SPARK-11219
> URL: https://issues.apache.org/jira/browse/SPARK-11219
> Project: Spark
> Issue Type: Documentation
> Components: Documentation, MLlib, PySpark
> Reporter: Bryan Cutler
> Priority: Trivial
>
> There are several different formats for describing params in PySpark.MLlib, making it unclear what the preferred way to document is, i.e. vertical alignment vs single line.
> This is to agree on a format and make it consistent across PySpark.MLlib.
> Following the discussion in SPARK-10560, using 2 lines with an indentation is both readable and doesn't lead to changing many lines when adding/removing parameters. If the parameter uses a default value, put this in parenthesis in a new line under the description.
> Example:
> {noformat}
> :param stepSize:
> Step size for each iteration of gradient descent.
> (default: 0.1)
> :param numIterations:
> Number of iterations run for each batch of data.
> (default: 50)
> {noformat}
> h2. Current State of Parameter Description Formating
> h4. Classification
> * LogisticRegressionModel - single line descriptions, fix indentations
> * LogisticRegressionWithSGD - vertical alignment, sporatic default values
> * LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
> * SVMModel - single line
> * SVMWithSGD - vertical alignment, sporatic default values
> * NaiveBayesModel - single line
> * NaiveBayes - single line
> h4. Clustering
> * KMeansModel - missing param description
> * KMeans - missing param description and defaults
> * GaussianMixture - vertical align, incorrect default formatting
> * PowerIterationClustering - single line with wrapped indentation, missing defaults
> * StreamingKMeansModel - single line wrapped
> * StreamingKMeans - single line wrapped, missing defaults
> * LDAModel - single line
> * LDA - vertical align, mising some defaults
> h4. FPM
> * FPGrowth - single line
> * PrefixSpan - single line, defaults values in backticks
> h4. Recommendation
> * ALS - does not have param descriptions
> h4. Regression
> * LabeledPoint - single line
> * LinearModel - single line
> * LinearRegressionWithSGD - vertical alignment
> * RidgeRegressionWithSGD - vertical align
> * IsotonicRegressionModel - single line
> * IsotonicRegression - single line, missing default
> h4. Tree
> * DecisionTree - single line with vertical indentation, missing defaults
> * RandomForest - single line with wrapped indent, missing some defaults
> * GradientBoostedTrees - single line with wrapped indent
> NOTE
> This issue will just focus on model/algorithm descriptions, which are the largest source of inconsistent formatting
> evaluation.py, feature.py, random.py, utils.py - these supporting classes have param descriptions as single line, but are consistent so don't need to be changed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org