You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/24 14:40:14 UTC

[GitHub] [spark] srowen opened a new pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

srowen opened a new pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684
 
 
   ### What changes were proposed in this pull request?
   
   Set the supplied output col name as intended when StringIndexer transforms an input after setOutputCols is used.
   
   ### Why are the changes needed?
   
   The output col names are wrong otherwise and downstream pipeline components fail.
   
   ### Does this PR introduce any user-facing change?
   
   Yes in the sense that it fixes incorrect behavior, otherwise no.
   
   ### How was this patch tested?
   
   Existing tests plus new direct tests of the schema.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#discussion_r383388228
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala
 ##########
 @@ -107,7 +107,7 @@ private[feature] trait StringIndexerBase extends Params with HasHandleInvalid wi
         s"but got $inputDataType.")
     require(schema.fields.forall(_.name != outputColName),
       s"Output column $outputColName already exists.")
-    NominalAttribute.defaultAttr.withName($(outputCol)).toStructField()
+    NominalAttribute.defaultAttr.withName(outputColName).toStructField()
 
 Review comment:
   Good catch! Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590395946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118872/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590395946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118872/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590395337
 
 
   **[Test build #118872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118872/testReport)** for PR 27684 at commit [`d07debf`](https://github.com/apache/spark/commit/d07debf449ee3ac7a72d888849d98f6b0f8470af).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590395936
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590354811
 
 
   **[Test build #118872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118872/testReport)** for PR 27684 at commit [`d07debf`](https://github.com/apache/spark/commit/d07debf449ee3ac7a72d888849d98f6b0f8470af).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590355575
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23621/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590355565
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590355575
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23621/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590395936
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590354811
 
 
   **[Test build #118872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118872/testReport)** for PR 27684 at commit [`d07debf`](https://github.com/apache/spark/commit/d07debf449ee3ac7a72d888849d98f6b0f8470af).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590647425
 
 
   Merged to master/3.0

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27684: [SPARK-30939][ML] Correctly set output col when StringIndexer.setOutputCols is used
URL: https://github.com/apache/spark/pull/27684#issuecomment-590355565
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org