You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/10 22:43:55 UTC

[GitHub] [spark] xinrong-databricks opened a new pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

xinrong-databricks opened a new pull request #33964:
URL: https://github.com/apache/spark/pull/33964


   ### What changes were proposed in this pull request?
   Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`
   
   
   ### Why are the changes needed?
   For better performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919536044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920303827


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921340401


   **[Test build #143372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143372/testReport)** for PR 33964 at commit [`8b3e6fb`](https://github.com/apache/spark/commit/8b3e6fb9b333394211ce5483c6eeca82794f1716).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-922126179


   BTW I couldn't reproduce the code generation issue. The reason why `isin` is slower compare to Broadcast is its slow planning time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [WIP][SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-918645242


   To adjust according to https://github.com/apache/spark/pull/33982.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920330754


   **[Test build #143322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143322/testReport)** for PR 33964 at commit [`cfa85b1`](https://github.com/apache/spark/commit/cfa85b1da7d5633c5f505ab001ae90ac7fcdf90f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919516856


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143276/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919527965


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921338361


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921346799


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47877/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921364291


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920372223


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921325192


   **[Test build #143369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143369/testReport)** for PR 33964 at commit [`a3e92bf`](https://github.com/apache/spark/commit/a3e92bf5017f03ccf26b79f8fc118d4d48ca162a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920297514


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919536044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920364282


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143322/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920288666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919629136


   except that mypy fails:
   
   ```
   mypy checks failed:
   python/pyspark/pandas/indexing.py:1668: error: Incompatible types in assignment (expression has type "List[<nothing>]", variable has type "Column")
   python/pyspark/pandas/indexing.py:1671: error: "Column" not callable
   python/pyspark/pandas/indexing.py:1672: error: Unsupported left operand type for | ("object")
   Found 3 errors in 1 file (checked 315 source files)
   ```
   
   cc @ueshin too fyi for a second look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917291044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47664/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921325192


   **[Test build #143369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143369/testReport)** for PR 33964 at commit [`a3e92bf`](https://github.com/apache/spark/commit/a3e92bf5017f03ccf26b79f8fc118d4d48ca162a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917288237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143160/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920367125


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
ueshin commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-923360325


   Thanks! merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921361039


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920257790


   **[Test build #143318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143318/testReport)** for PR 33964 at commit [`057c542`](https://github.com/apache/spark/commit/057c542def14d55b1d0d2c14b6adbcd21703b640).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithChainOr ` is faster. However, its planning time makes its total time the worst.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917256059


   **[Test build #143160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143160/testReport)** for PR 33964 at commit [`8c7c572`](https://github.com/apache/spark/commit/8c7c572d3cfd7baff498f76df8ea527e189028e0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithOrChain` is faster. However, its planning time makes its total time the worst.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921365002






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920330754


   **[Test build #143322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143322/testReport)** for PR 33964 at commit [`cfa85b1`](https://github.com/apache/spark/commit/cfa85b1da7d5633c5f505ab001ae90ac7fcdf90f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithOrChain` is faster. However, its planning time makes its total time the worst.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithInCollection` is still faster.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920303793


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919499856


   **[Test build #143276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143276/testReport)** for PR 33964 at commit [`e7de5be`](https://github.com/apache/spark/commit/e7de5beaa508dc52e060b22325a1be50d6559c64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920272166


   **[Test build #143318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143318/testReport)** for PR 33964 at commit [`057c542`](https://github.com/apache/spark/commit/057c542def14d55b1d0d2c14b6adbcd21703b640).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919536019


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47779/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919516856


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143276/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921346841


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47877/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithOrChain` is faster.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-923343694


   CC @ueshin @itholic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917285492


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47664/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920303827


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921337446


   **[Test build #143369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143369/testReport)** for PR 33964 at commit [`a3e92bf`](https://github.com/apache/spark/commit/a3e92bf5017f03ccf26b79f8fc118d4d48ca162a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921328484


   After a rough benchmark, `Column.isin` performs 3 times as fast as the long projection approach. So the PR is updated to use `Column.isin` even exceeding the `compute.isin_limit`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921328484


   After a rough benchmark, `Column.isin` performs 3 times as fast as the long projection approach. So the PR is updated to use `Column.isin` even exceeding the `compute.isin_limit`.
   
   CC @HyukjinKwon @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920372195


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919516448


   **[Test build #143276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143276/testReport)** for PR 33964 at commit [`e7de5be`](https://github.com/apache/spark/commit/e7de5beaa508dc52e060b22325a1be50d6559c64).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917291044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47664/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921346841


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47877/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921362123


   > After a rough benchmark, `Column.isin` performs 3 times as fast as the long projection approach. So the PR is updated to use `Column.isin` even exceeding the `compute.isin_limit`.
   
   How many predicates did you use? IIRC, you meet a codegeneration issue or sth if there are too many predicates for isin. Maybe let's keep the config for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithOrChain` is faster.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825702-30a1e75a-2f7b-41e6-8d96-5ac12f9fc3a4.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920372223


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47825/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin closed pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
ueshin closed pull request #33964:
URL: https://github.com/apache/spark/pull/33964


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921353002


   **[Test build #143372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143372/testReport)** for PR 33964 at commit [`8b3e6fb`](https://github.com/apache/spark/commit/8b3e6fb9b333394211ce5483c6eeca82794f1716).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the long projection `runWithInCollection ` is faster. However, its planning time makes its total time the worst.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917264450


   **[Test build #143160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143160/testReport)** for PR 33964 at commit [`8c7c572`](https://github.com/apache/spark/commit/8c7c572d3cfd7baff498f76df8ea527e189028e0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917256059


   **[Test build #143160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143160/testReport)** for PR 33964 at commit [`8c7c572`](https://github.com/apache/spark/commit/8c7c572d3cfd7baff498f76df8ea527e189028e0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920257790


   **[Test build #143318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143318/testReport)** for PR 33964 at commit [`057c542`](https://github.com/apache/spark/commit/057c542def14d55b1d0d2c14b6adbcd21703b640).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920288666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920350173


   **[Test build #143322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143322/testReport)** for PR 33964 at commit [`cfa85b1`](https://github.com/apache/spark/commit/cfa85b1da7d5633c5f505ab001ae90ac7fcdf90f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921365007






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921338361


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921340401


   **[Test build #143372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143372/testReport)** for PR 33964 at commit [`8b3e6fb`](https://github.com/apache/spark/commit/8b3e6fb9b333394211ce5483c6eeca82794f1716).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921943161


   100 predicates are used.
   
   The long projection takes 188786 ms.
   `isin` takes 61167 ms.
   Broadcast DF (Join) takes 54841 ms.
   
   Broadcast DF (Join) is the best, but it's hard to apply due to current function structure.
   
   I checked the Spark UI, the long projection(original approach) is 3 times slower because of its planning time. Considering the execution time only (as below), the `runWithInCollection` is still faster.
   
   ![image](https://user-images.githubusercontent.com/47337188/133825828-e376f0d8-3247-416b-a2c6-0b7a21ab7cb8.png)
   
   
   CC @HyukjinKwon @ueshin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-919499856


   **[Test build #143276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143276/testReport)** for PR 33964 at commit [`e7de5be`](https://github.com/apache/spark/commit/e7de5beaa508dc52e060b22325a1be50d6559c64).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917291025


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47664/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-920364282


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143322/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33964: Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-917288237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143160/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-923072095


   FYI @sigmod for https://github.com/apache/spark/pull/33964#issuecomment-921943161 in case you are interested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33964: [SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33964:
URL: https://github.com/apache/spark/pull/33964#issuecomment-921342494


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47877/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org