You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/20 02:33:58 UTC

[GitHub] [spark] harupy opened a new pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

harupy opened a new pull request #32245:
URL: https://github.com/apache/spark/pull/32245


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Fixes https://issues.apache.org/jira/browse/SPARK-35142
   
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r617135442



##########
File path: python/pyspark/ml/tests/test_algorithms.py
##########
@@ -115,6 +115,7 @@ def test_output_columns(self):
         model = ovr.fit(df)
         output = model.transform(df)
         self.assertEqual(output.columns, ["label", "features", "rawPrediction", "prediction"])
+        self.assertIsInstance(output.schema["rawPrediction"].dataType, VectorUDT)

Review comment:
       Got it!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823720317


   **[Test build #137708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137708/testReport)** for PR 32245 at commit [`b6fabb3`](https://github.com/apache/spark/commit/b6fabb3eec661805f2a89eb839d01f7d5625e0f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823066628


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822942777


   **[Test build #137665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137665/testReport)** for PR 32245 at commit [`3f75ab2`](https://github.com/apache/spark/commit/3f75ab2f69667009afc55a9c985c6fa84e8ba04f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822997334


   **[Test build #137668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137668/testReport)** for PR 32245 at commit [`3c2ac95`](https://github.com/apache/spark/commit/3c2ac9521d7e4ce60a06a0291b9abe466908340c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823859688


   BTW, the tests passed at https://github.com/harupy/spark/actions/runs/769366516. GitHub Actions didn't work properly for linking that run for some reasons .. 
   
   I will leave it to @WeichenXu123 then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955498


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137665/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822936795


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823901356


   I don't see backport to 2.4. Do you plan to backport it? @WeichenXu123 @harupy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955744


   **[Test build #137666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137666/testReport)** for PR 32245 at commit [`5e05b50`](https://github.com/apache/spark/commit/5e05b5053dc9c84f4ae10b8804e07e2041a2c321).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822956276


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137666/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822974434


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823725038


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137708/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823851644


   @viirya, are you preparing Spark 2.4 RC now? This is supposed to be in Spark 2.4 too but this isn't a regression so it doesn't block. It's just a good to have so if you're preparing, it should be fine to don't backport.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column (`double array`) to `VectorUDT` here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column?
   
   https://github.com/apache/spark/blob/1630d64cab216f1404bf0940483ec3ecb86732d1/python/pyspark/ml/classification.py#L1964-L2020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-824165577


   Thanks for confirming. @harupy @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616356650



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       @HyukjinKwon 
   I want to know if no udf return type specified, how does the return type inferring work ? Check all rows udf return type ?
   The master code failed in some cases and the return column type in schema become "String".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822974434


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823760479


   **[Test build #137713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137713/testReport)** for PR 32245 at commit [`ed26d2c`](https://github.com/apache/spark/commit/ed26d2cef4d321b0c5fee7a2a851f9535beb12c9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823778063






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r617131801



##########
File path: python/pyspark/ml/tests/test_algorithms.py
##########
@@ -115,6 +115,7 @@ def test_output_columns(self):
         model = ovr.fit(df)
         output = model.transform(df)
         self.assertEqual(output.columns, ["label", "features", "rawPrediction", "prediction"])
+        self.assertIsInstance(output.schema["rawPrediction"].dataType, VectorUDT)

Review comment:
       add a separate unit test with name "fix SPARK-35142:..." will be better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column?
   
   https://github.com/apache/spark/blob/1630d64cab216f1404bf0940483ec3ecb86732d1/python/pyspark/ml/classification.py#L1964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r617131553



##########
File path: python/pyspark/ml/tests/test_algorithms.py
##########
@@ -115,6 +115,7 @@ def test_output_columns(self):
         model = ovr.fit(df)
         output = model.transform(df)
         self.assertEqual(output.columns, ["label", "features", "rawPrediction", "prediction"])
+        self.assertIsInstance(output.schema["rawPrediction"].dataType, VectorUDT)

Review comment:
       Let's add a test of running "model.transform(df).head()", ensure it do not raise error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955744


   **[Test build #137666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137666/testReport)** for PR 32245 at commit [`5e05b50`](https://github.com/apache/spark/commit/5e05b5053dc9c84f4ae10b8804e07e2041a2c321).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r617284560



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       Seems like `pred.show()` triggers an exception too? what does it return in other methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823724892


   **[Test build #137708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137708/testReport)** for PR 32245 at commit [`b6fabb3`](https://github.com/apache/spark/commit/b6fabb3eec661805f2a89eb839d01f7d5625e0f8).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823846636


   Looks good. @harupy, would you mind filling the PR description per the template?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616358286



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Got it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822997334


   **[Test build #137668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137668/testReport)** for PR 32245 at commit [`3c2ac95`](https://github.com/apache/spark/commit/3c2ac9521d7e4ce60a06a0291b9abe466908340c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955498






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823778063






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823024664






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column?
   
   https://github.com/apache/spark/blob/1630d64cab216f1404bf0940483ec3ecb86732d1/python/pyspark/ml/classification.py#L1964-L2009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` outputs the raw prediction column?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955816






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column (type: `ArrayType(DoubleType())`) to `VectorUDT()` here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822942777


   **[Test build #137665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137665/testReport)** for PR 32245 at commit [`3f75ab2`](https://github.com/apache/spark/commit/3f75ab2f69667009afc55a9c985c6fa84e8ba04f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823720317


   **[Test build #137708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137708/testReport)** for PR 32245 at commit [`b6fabb3`](https://github.com/apache/spark/commit/b6fabb3eec661805f2a89eb839d01f7d5625e0f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823738562


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42236/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822956276


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137666/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `rawPrediction` column to `VectorUDT` here instead of using `udf`?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column to `VectorUDT` here instead of using `udf`?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```

##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column to `VectorUDT` here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823845888


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823883887


   @harupy 
   
   Backport to branch-3.1 cause conflicts.
   Could you create a PR against apache/spark branch-3.1 ?
   
   ```
   ++<<<<<<< HEAD
    +    def test_parallelism_doesnt_change_output(self):
   ++=======
   +     def test_raw_prediction_column_is_of_vector_type(self):
   +         # SPARK-35142: `OneVsRestModel` outputs raw prediction as a string column
   +         df = self.spark.createDataFrame([(0.0, Vectors.dense(1.0, 0.8)),
   +                                          (1.0, Vectors.sparse(2, [], [])),
   +                                          (2.0, Vectors.dense(0.5, 0.5))],
   +                                         ["label", "features"])
   +         lr = LogisticRegression(maxIter=5, regParam=0.01)
   +         ovr = OneVsRest(classifier=lr, parallelism=1)
   +         model = ovr.fit(df)
   +         row = model.transform(df).head()
   +         self.assertIsInstance(row["rawPrediction"], DenseVector)
   + 
   +     def test_parallelism_does_not_change_output(self):
   ++>>>>>>> b6350f5bb0... [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822956263


   **[Test build #137666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137666/testReport)** for PR 32245 at commit [`5e05b50`](https://github.com/apache/spark/commit/5e05b5053dc9c84f4ae10b8804e07e2041a2c321).
    * This patch **fails Python style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column? Looks like it doesn't.
   
   https://github.com/apache/spark/blob/1630d64cab216f1404bf0940483ec3ecb86732d1/python/pyspark/ml/classification.py#L1964-L2009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616357907



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       @HyukjinKwon 
   why only `transformed_df.head()` trigger this error ?
   does it indicate bugs in pyspark-sql udf ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823760479


   **[Test build #137713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137713/testReport)** for PR 32245 at commit [`ed26d2c`](https://github.com/apache/spark/commit/ed26d2cef4d321b0c5fee7a2a851f9535beb12c9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823996541


   Okay, looks like we can skip Spark 2.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column?
   
   https://github.com/apache/spark/blob/1630d64cab216f1404bf0940483ec3ecb86732d1/python/pyspark/ml/classification.py#L1964-L2000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822974397






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column (type: `ArrayType(DoubleType())`) to `VectorUDT()` here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822955839


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42193/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823066628


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823773327


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822936795


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616312217



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       Should I add a test here to ensure that the `rawPrediction` column is no longer `string`
   
   https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/python/pyspark/ml/tests/test_algorithms.py#L108-L117




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823766582


   **[Test build #137713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137713/testReport)** for PR 32245 at commit [`ed26d2c`](https://github.com/apache/spark/commit/ed26d2cef4d321b0c5fee7a2a851f9535beb12c9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822941637






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823855067


   > @viirya, are you preparing Spark 2.4 RC now? This is supposed to be in Spark 2.4 too but this isn't a regression so it doesn't block. It's just a good to have so if you're preparing, it should be fine to don't backport.
   
   https://github.com/apache/spark/pull/32256 was just merged, so I have not started new RC yet. I can wait for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823008089


   **[Test build #137668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137668/testReport)** for PR 32245 at commit [`3c2ac95`](https://github.com/apache/spark/commit/3c2ac9521d7e4ce60a06a0291b9abe466908340c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616310831



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       Can we just cast the `accColName` column (type: `ArrayType(DoubleType())`) to `VectorUDT()` here?
   
   ```suggestion
               aggregatedDataset = aggregatedDataset.withColumn(
                   self.getRawPredictionCol(), aggregatedDataset[accColName].cast(VectorUDT()))
   ```
   
   I think `cast` clarifies that we're just converting the data type here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823742541


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42236/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616312217



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       Do I need to add a test here to ensure that the `rawPrediction` column is no longer `string`
   
   https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/python/pyspark/ml/tests/test_algorithms.py#L108-L117




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823725038


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137708/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823026417






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823742541


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42236/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy edited a comment on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy edited a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823904455


   @viirya Got it. I'll open another PR for 2.4.
   
   ---
   
   Wait, does `OneVsRestModel` in 2.4 output the raw prediction column?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616312904



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       Yeah, I think we should better add a test if possible.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823775190


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42241/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616356246



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)
+            rawPredictionUDF = udf(func, VectorUDT())
             aggregatedDataset = aggregatedDataset.withColumn(
                 self.getRawPredictionCol(), rawPredictionUDF(aggregatedDataset[accColName]))

Review comment:
       We should use `udf(func, VectorUDT())`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 closed pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 closed pull request #32245:
URL: https://github.com/apache/spark/pull/32245


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on a change in pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on a change in pull request #32245:
URL: https://github.com/apache/spark/pull/32245#discussion_r616316046



##########
File path: python/pyspark/ml/classification.py
##########
@@ -3151,7 +3151,7 @@ def func(predictions):
                     predArray.append(x)
                 return Vectors.dense(predArray)
 
-            rawPredictionUDF = udf(func)

Review comment:
       Got it, added a test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] harupy commented on pull request #32245: [SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
harupy commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823889595


   @WeichenXu123 Opened a PR: https://github.com/apache/spark/pull/32269


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822951497


   **[Test build #137665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137665/testReport)** for PR 32245 at commit [`3f75ab2`](https://github.com/apache/spark/commit/3f75ab2f69667009afc55a9c985c6fa84e8ba04f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-823026415






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] WeichenXu123 commented on pull request #32245: [SPARK-35142][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

Posted by GitBox <gi...@apache.org>.
WeichenXu123 commented on pull request #32245:
URL: https://github.com/apache/spark/pull/32245#issuecomment-822987901


   CC @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org