You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by VinceShieh <gi...@git.apache.org> on 2017/03/10 07:03:22 UTC

[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

GitHub user VinceShieh opened a pull request:

    https://github.com/apache/spark/pull/17237

    [SPARK-19852][PYSPARK][ML] Update Python API setHandleInvalid for StringIndexer

    ## What changes were proposed in this pull request?
    
    This PR is to maintain API parity with changes made in SPARK-17498 to support a new option
    'keep' in StringIndexer to handle unseen labels with pyspark
    
    ## How was this patch tested?
    existing tests
    testing is done with new doctests
    
    Signed-off-by: VinceShieh <vi...@intel.com>


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/VinceShieh/spark spark-19852

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17237
    
----
commit d94dc68a8c1a5c082cf3de6c7e4d429bfd24d817
Author: VinceShieh <vi...@intel.com>
Date:   2017-03-10T06:50:41Z

    [SPARK-19852][PYSPARK][ML] Update Python API for StringIndexer setHandleInvalid
    
    This PR reflect the changes made in SPARK-17498 on pyspark to support a new option
    'keep' in StringIndexer to handle unseen labels
    
    Signed-off-by: VinceShieh <vi...@intel.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74308/testReport)** for PR 17237 at commit [`affeeb7`](https://github.com/apache/spark/commit/affeeb770faf8fc3b5dc924e8e23704dee3f21ba).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74312/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74301/testReport)** for PR 17237 at commit [`d94dc68`](https://github.com/apache/spark/commit/d94dc68a8c1a5c082cf3de6c7e4d429bfd24d817).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74305/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74305/testReport)** for PR 17237 at commit [`f1d9bcb`](https://github.com/apache/spark/commit/f1d9bcb3d615444a3c326f0b6b0f7999edecdf4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74301/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74308/testReport)** for PR 17237 at commit [`affeeb7`](https://github.com/apache/spark/commit/affeeb770faf8fc3b5dc924e8e23704dee3f21ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17237#discussion_r113983452
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1936,6 +1935,14 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,
         >>> sorted(set([(i[0], str(i[1])) for i in itd.select(itd.id, itd.label2).collect()]),
         ...     key=lambda x: x[0])
         [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'a'), (5, 'c')]
    +    >>> testData2 = sc.parallelize([Row(id=0, label="a"), Row(id=1, label="d"),
    +    ...     Row(id=2, label=None)], 2)
    +    >>> dfKeep= spark.createDataFrame(testData2)
    +    >>> modelKeep = stringIndexer.setHandleInvalid("keep").fit(stringIndDf)
    +    >>> tdK = modelKeep.transform(dfKeep)
    +    >>> sorted(set([(i[0], i[1]) for i in tdK.select(tdK.id, tdK.indexed).collect()]),
    +    ...     key=lambda x: x[0])
    +    [(0, 0.0), (1, 3.0), (2, 3.0)]
    --- End diff --
    
    Could you move the newly added test to ```tests.py```? We keep the basic doc tests here both for test and example, other tests should be placed at ```tests.py```. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17237#discussion_r114402495
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1936,6 +1935,14 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,
         >>> sorted(set([(i[0], str(i[1])) for i in itd.select(itd.id, itd.label2).collect()]),
         ...     key=lambda x: x[0])
         [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'a'), (5, 'c')]
    +    >>> testData2 = sc.parallelize([Row(id=0, label="a"), Row(id=1, label="d"),
    +    ...     Row(id=2, label=None)], 2)
    +    >>> dfKeep= spark.createDataFrame(testData2)
    +    >>> modelKeep = stringIndexer.setHandleInvalid("keep").fit(stringIndDf)
    +    >>> tdK = modelKeep.transform(dfKeep)
    +    >>> sorted(set([(i[0], i[1]) for i in tdK.select(tdK.id, tdK.indexed).collect()]),
    +    ...     key=lambda x: x[0])
    +    [(0, 0.0), (1, 3.0), (2, 3.0)]
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74711/testReport)** for PR 17237 at commit [`1d2f28f`](https://github.com/apache/spark/commit/1d2f28f2449691fe7efe3e7dfbfe577ad8a56c1f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Thanks for the PR!  I just merged the fix for https://issues.apache.org/jira/browse/SPARK-11569 which will affect this PR.  Would you mind updating this PR to include SPARK-11569's handling of null values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74301/testReport)** for PR 17237 at commit [`d94dc68`](https://github.com/apache/spark/commit/d94dc68a8c1a5c082cf3de6c7e4d429bfd24d817).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74312/testReport)** for PR 17237 at commit [`b4bb765`](https://github.com/apache/spark/commit/b4bb765a672d53d7fdd8b0378c7a5762ec881078).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #76181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76181/testReport)** for PR 17237 at commit [`1d2f28f`](https://github.com/apache/spark/commit/1d2f28f2449691fe7efe3e7dfbfe577ad8a56c1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by VinceShieh <gi...@git.apache.org>.
Github user VinceShieh commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Sure. No problem!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76181/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17237#discussion_r116522476
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1936,6 +1935,14 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,
         >>> sorted(set([(i[0], str(i[1])) for i in itd.select(itd.id, itd.label2).collect()]),
         ...     key=lambda x: x[0])
         [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'a'), (5, 'c')]
    +    >>> testData2 = sc.parallelize([Row(id=0, label="a"), Row(id=1, label="d"),
    +    ...     Row(id=2, label=None)], 2)
    +    >>> dfKeep= spark.createDataFrame(testData2)
    +    >>> modelKeep = stringIndexer.setHandleInvalid("keep").fit(stringIndDf)
    +    >>> tdK = modelKeep.transform(dfKeep)
    +    >>> sorted(set([(i[0], i[1]) for i in tdK.select(tdK.id, tdK.indexed).collect()]),
    +    ...     key=lambda x: x[0])
    +    [(0, 0.0), (1, 3.0), (2, 3.0)]
    --- End diff --
    
    @VinceShieh Do you have time to update this PR? We would like to get this in 2.2. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

Posted by VinceShieh <gi...@git.apache.org>.
Github user VinceShieh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17237#discussion_r115186158
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1936,6 +1935,14 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,
         >>> sorted(set([(i[0], str(i[1])) for i in itd.select(itd.id, itd.label2).collect()]),
         ...     key=lambda x: x[0])
         [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'a'), (5, 'c')]
    +    >>> testData2 = sc.parallelize([Row(id=0, label="a"), Row(id=1, label="d"),
    +    ...     Row(id=2, label=None)], 2)
    +    >>> dfKeep= spark.createDataFrame(testData2)
    +    >>> modelKeep = stringIndexer.setHandleInvalid("keep").fit(stringIndDf)
    +    >>> tdK = modelKeep.transform(dfKeep)
    +    >>> sorted(set([(i[0], i[1]) for i in tdK.select(tdK.id, tdK.indexed).collect()]),
    +    ...     key=lambda x: x[0])
    +    [(0, 0.0), (1, 3.0), (2, 3.0)]
    --- End diff --
    
    Okay. Sure @yanboliang @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74711/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    gentle re-ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74711/testReport)** for PR 17237 at commit [`1d2f28f`](https://github.com/apache/spark/commit/1d2f28f2449691fe7efe3e7dfbfe577ad8a56c1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74305/testReport)** for PR 17237 at commit [`f1d9bcb`](https://github.com/apache/spark/commit/f1d9bcb3d615444a3c326f0b6b0f7999edecdf4f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Gentle ping to @VinceShieh to see if you have a chance to update this with the test movement & merge in the latest from master branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74308/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #76181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76181/testReport)** for PR 17237 at commit [`1d2f28f`](https://github.com/apache/spark/commit/1d2f28f2449691fe7efe3e7dfbfe577ad8a56c1f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17237
  
    **[Test build #74312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74312/testReport)** for PR 17237 at commit [`b4bb765`](https://github.com/apache/spark/commit/b4bb765a672d53d7fdd8b0378c7a5762ec881078).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org