You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2018/01/20 03:01:00 UTC

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

    [ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332698#comment-16332698 ] 

Bryan Cutler edited comment on SPARK-23109 at 1/20/18 3:00 AM:
---------------------------------------------------------------

I did the following: generated HTML doc and checked for consistency with Scala,  did not see any API breaking changes, checked for missing items (see list below), checked default param values match.  No blocking or major issues found.

Items requiring follow up, I will create (related) JIRAS to fix:

classification:
    GBTClassifier - missing featureSubsetStrategy, should be moved to TreeEnsembleParams
    GBTClassificationModel - missing numClasses, should inherit from JavaClassificationModel
    https://issues.apache.org/jira/browse/SPARK-23161 for the above

clustering:
    GuassianMixtureModel - missing guassians, need to serialize Array[MultivariateGaussian]?
    LDAModel - missing topicsMatrix - can send Matrix through Py4J?

evaluation:
    ClusteringEvaluator - DOC describe silhouette like scaladoc

feature:
    Bucketizer - mulitple input/output cols, splitsArray - https://issues.apache.org/jira/browse/SPARK-22797
    ChiSqSelector - DOC selectorType desc missing new types
    QuantileDiscretizer - multiple input output cols - https://issues.apache.org/jira/browse/SPARK-22796

fpm:
    DOC associationRules should say return "DataFrame"

image:
    missing columnSchema, get*, scala missing toNDArray

regression:
    LinearRegressionSummary - missing r2adj

stat:
    missing Summarizer class - https://issues.apache.org/jira/browse/SPARK-21741

tuning:
    missing subModels, hasSubModels - https://issues.apache.org/jira/browse/SPARK-22005


was (Author: bryanc):
I did the following: generated HTML doc and checked for consistency with Scala,  did not see any API breaking changes, checked for missing items (see list below), checked default param values match.  No blocking or major issues found.

Items requiring follow up, I will create (related) JIRAS to fix:

classification:
    GBTClassifier - missing featureSubsetStrategy, should be moved to TreeEnsembleParams
    GBTClassificationModel - missing numClasses, should inherit from JavaClassificationModel

clustering:
    GuassianMixtureModel - missing guassians, need to serialize Array[MultivariateGaussian]?
    LDAModel - missing topicsMatrix - can send Matrix through Py4J?

evaluation:
    ClusteringEvaluator - DOC describe silhouette like scaladoc

feature:
    Bucketizer - mulitple input/output cols, splitsArray - https://issues.apache.org/jira/browse/SPARK-22797
    ChiSqSelector - DOC selectorType desc missing new types
    QuantileDiscretizer - multiple input output cols - https://issues.apache.org/jira/browse/SPARK-22796

fpm:
    DOC associationRules should say return "DataFrame"

image:
    missing columnSchema, get*, scala missing toNDArray

regression:
    LinearRegressionSummary - missing r2adj

stat:
    missing Summarizer class - https://issues.apache.org/jira/browse/SPARK-21741

tuning:
    missing subModels, hasSubModels - https://issues.apache.org/jira/browse/SPARK-22005

> ML 2.3 QA: API: Python API coverage
> -----------------------------------
>
>                 Key: SPARK-23109
>                 URL: https://issues.apache.org/jira/browse/SPARK-23109
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, ML, PySpark
>            Reporter: Joseph K. Bradley
>            Priority: Blocker
>
> For new public APIs added to MLlib ({{spark.ml}} only), we need to check the generated HTML doc and compare the Scala & Python versions.
> * *GOAL*: Audit and create JIRAs to fix in the next release.
> * *NON-GOAL*: This JIRA is _not_ for fixing the API parity issues.
> We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental.  These must be recorded and added in the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle.  *Please use a _separate_ JIRA (linked below as "requires") for this list of to-do items.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org