You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/06 23:48:16 UTC

[GitHub] [spark] sunchao opened a new pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

sunchao opened a new pull request #29959:
URL: https://github.com/apache/spark/pull/29959


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This PR is a follow-up of #29471 and does the following improvements for `HadoopFSUtils`:
   1. Removes the extra `filterFun` from the listing API and combines it with the `filter`.
   2. Removes `SerializableBlockLocation` and `SerializableFileStatus` given that `BlockLocation` and `FileStatus` are already serializable.
   3. Simplify file listing logic by relying on `FileSystem.listLocatedStatus` and `FileSystem.listStatus` based on the `ignoreLocality` flag. In contrast, currently we only call `listLocatedStatus` for `DistributedFileSystem` and `ViewFileSystem` which means Spark cannot take advantage on more efficient `listLocatedStatus` impls from sub-`FileSystem` classes.
   4. Hides the `isRootLevel` flag from the top-level API.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Main purpose is to simplify the logic within `HadoopFSUtils`. 
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   The change 3) above could potentially change user behavior when `spark.sql.files.ignoreMissingFiles` is set and there are race conditions during listing. 
   
   For instance, if locality is required and some files in the listing result were deleted right after the listing op, but before the subsequent `getFileBlockLocations` call in the previous code, previous impl would return partial listing result but the current impl will return an empty set, since the default `listLocatedStatus` will call `getFileBlockLocations` which will cause it fail as a whole. 
   
   On the other hand, there could also be case where the new impl returns a full list rather than a partial list in the previous impl, since we don't call `getFileBlockLocations` as a separate step now.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Existing unit tests (e.g., `FileIndexSuite`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706418371


   Jenkins, add to whitelist


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712509003


   **[Test build #130018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130018/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r508037942



##########
File path: core/pom.xml
##########
@@ -562,7 +562,8 @@
       </properties>
     </profile>
     <profile>
-      <id>sparkr</id>
+
+    <id>sparkr</id>

Review comment:
       indentation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712377119


   **[Test build #130012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130012/testReport)** for PR 29959 at commit [`d3f7f73`](https://github.com/apache/spark/commit/d3f7f7338c3c9af9810cdf7dbe6564d1a1adfdd5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729161637






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #29959:
URL: https://github.com/apache/spark/pull/29959


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728298851


   @holdenk sure - it's done.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706484435


   Removed the changes on file listing (may deal with it in a separate PR). @holdenk mind take another look? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728356364


   **[Test build #131175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131175/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705823373


   Thanks @holdenk for the review. Yes this PR still needs a bit more work. Will update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704628492


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34085/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704682573






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706501505


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705779006






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705743682






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710764579


   **[Test build #129942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129942/testReport)** for PR 29959 at commit [`3e63aac`](https://github.com/apache/spark/commit/3e63aac6f248ec890aa9e1e7b2b39e3cf4f5b306).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712386488


   **[Test build #130014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130014/testReport)** for PR 29959 at commit [`8ac97da`](https://github.com/apache/spark/commit/8ac97daf66bef80f5f108c7283b2f66818c074f2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712446027


   **[Test build #130018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130018/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728334094


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728334094






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705947952


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34177/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705943824


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34177/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729960357


   Thanks @holdenk for the review & merge!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712466925






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705820278






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704649312


   **[Test build #129484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129484/testReport)** for PR 29959 at commit [`8c8aa81`](https://github.com/apache/spark/commit/8c8aa81b40c2b00404793f7f63fd67db9f999bc9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729324373


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35841/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728338407


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35777/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705767893


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34169/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706458017






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706458009


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34209/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706356527


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34195/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706501512


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129614/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712422081






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706460732






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728334100


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35775/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706433052


   **[Test build #129605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129605/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704682577


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129484/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712466241


   **[Test build #130014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130014/testReport)** for PR 29959 at commit [`8ac97da`](https://github.com/apache/spark/commit/8ac97daf66bef80f5f108c7283b2f66818c074f2).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705971167






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718863576


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35018/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706392738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712444758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728339763






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729348658


   **[Test build #131237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131237/testReport)** for PR 29959 at commit [`e9d399d`](https://github.com/apache/spark/commit/e9d399de621a9cbfc32438b3460f78bef0e73de9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710771573


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34547/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706460732






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704664267






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r503493043



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       Thanks @steveloughran , yes I also think it's better to rely on the FileSystem-specific `listLocatedStatus` impl rather than having the logic here. However, the change seems to break a few assumptions in the test cases so I'll isolate it into another PR. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729315527


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35841/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706470928


   **[Test build #129606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129606/testReport)** for PR 29959 at commit [`1b4bfbe`](https://github.com/apache/spark/commit/1b4bfbef579b76903d9fd4e6421b754b30f19e80).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718827067


   Retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712444758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706441138


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34207/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704628501






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706453097


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129604/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705778991


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34169/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712398514


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34619/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706485127


   **[Test build #129614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129614/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706392738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728294989






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-731284079


   Hi, @holdenk and @sunchao .
   
   Could you check Hadoop 2.7 failure?
   - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1609/
   
   ```
   [info] - SPARK-24626 parallel file listing in Stats computation *** FAILED *** (2 seconds, 408 milliseconds)
   [info]   org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 in stage 21.0 (TID 19) had a not serializable result: org.apache.hadoop.fs.Path
   [info] Serialization stack:
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705970583


   **[Test build #129571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129571/testReport)** for PR 29959 at commit [`f582b17`](https://github.com/apache/spark/commit/f582b172766233f1554088b14e76ed5363ef7c19).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729161637


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728294979






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728356803






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728375583


   **[Test build #131176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131176/testReport)** for PR 29959 at commit [`cb76047`](https://github.com/apache/spark/commit/cb76047370a71deaa3b1a50e709ffe68a2b2a52a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718832137


   **[Test build #130414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130414/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r556669791



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -245,27 +208,18 @@ private[spark] object HadoopFSUtils extends Logging {
         Array.empty[FileStatus]
     }
 
-    def doFilter(statuses: Array[FileStatus]) = filterFun match {
-      case Some(shouldFilterOut) =>
-        statuses.filterNot(status => shouldFilterOut(status.getPath.getName))
-      case None =>
-        statuses
-    }
-
-    val filteredStatuses = doFilter(statuses)
     val allLeafStatuses = {
-      val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory)
+      val (dirs, topLevelFiles) = statuses.partition(_.isDirectory)

Review comment:
       @gengliangwang you're right. Thanks for catching this! and sorry for introducing this regression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728298010


   **[Test build #131175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131175/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704682573


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705947958






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705779022


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34169/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706441143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728295428






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712446711


   @dongjoon-hyun sorry I was reusing this PR for some testing. It is restored now. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729161650


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35835/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718933021






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706574395


   Removed the file listing changes (might open another PR just for that). @holdenk mind take another look? thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704626247


   **[Test build #129479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129479/testReport)** for PR 29959 at commit [`6f7dc79`](https://github.com/apache/spark/commit/6f7dc7954fc686f84f584c20ad9cdc488afee051).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705823373


   Thanks @holdenk for the review. Yes this PR still needs a bit more work. Will update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r508037809



##########
File path: common/network-yarn/pom.xml
##########
@@ -74,6 +74,7 @@
     </dependency>
   </dependencies>
 
+

Review comment:
       This looks like a leftover. Shall we clean up this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712407414


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34621/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-713023099


   @holdenk @dongjoon-hyun could you take another look at this? I think the test failure is unrelated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728227862


   I meant to merge this awhile ago, my bad.
   Let's make sure everything still passes in CI.
   Jenkins retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704659726


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34090/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718933021






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728301817


   **[Test build #131176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131176/testReport)** for PR 29959 at commit [`cb76047`](https://github.com/apache/spark/commit/cb76047370a71deaa3b1a50e709ffe68a2b2a52a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r508038156



##########
File path: external/kafka-0-10-token-provider/pom.xml
##########
@@ -25,6 +25,7 @@
     <relativePath>../../pom.xml</relativePath>
   </parent>
 
+

Review comment:
       ?

##########
File path: external/kinesis-asl-assembly/pom.xml
##########
@@ -132,6 +132,7 @@
     </dependency>
   </dependencies>
 
+

Review comment:
       ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706441143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728334075


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35775/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728295434


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131174/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729349274






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710771585






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705743682


   **[Test build #129563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129563/testReport)** for PR 29959 at commit [`5e299a2`](https://github.com/apache/spark/commit/5e299a2e9e1944dc93bf5aad8c5c76125710cf3b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718876987






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706445912






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704626272


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129479/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728294979


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712422081






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712386488


   **[Test build #130014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130014/testReport)** for PR 29959 at commit [`8ac97da`](https://github.com/apache/spark/commit/8ac97daf66bef80f5f108c7283b2f66818c074f2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r501927461



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -57,11 +50,22 @@ private[spark] object HadoopFSUtils extends Logging {
    * @param parallelismMax The maximum parallelism for listing. If the number of input paths is
    *                       larger than this value, parallelism will be throttled to this value
    *                       to avoid generating too many tasks.
-   * @param filterFun Optional predicate on the leaf files. Files who failed the check will be
-   *                  excluded from the results
    * @return for each input path, the set of discovered files for the path
    */
   def parallelListLeafFiles(
+    sc: SparkContext,
+    paths: Seq[Path],
+    hadoopConf: Configuration,
+    filter: PathFilter,
+    ignoreMissingFiles: Boolean,
+    ignoreLocality: Boolean,
+    parallelismThreshold: Int,
+    parallelismMax: Int): Seq[(Path, Seq[FileStatus])] = {
+    parallelListLeafFilesInternal(sc, paths, hadoopConf, filter, true, ignoreMissingFiles,

Review comment:
       nit: For readability I'd passed this as a named parameter since a bare boolean isn't very clear.

##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       Is there a chance a FS won't have this implemented? as per the previous code's comment.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
##########
@@ -214,9 +214,9 @@ class FileIndexSuite extends SharedSparkSession {
               assert(leafFiles.isEmpty)
             } else {
               assert(raceCondition == classOf[FileDeletionRaceFileSystem])
-              // One of the two leaf files was missing, but we should still list the other:
-              assert(leafFiles.size == 1)
-              assert(leafFiles.head.getPath == nonDeletedLeafFilePath)
+              // listLocatedStatus will fail as a whole because the default impl calls
+              // getFileBlockLocations
+              assert(leafFiles.isEmpty)

Review comment:
       This seems to indicate the change needs some work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705930551


   **[Test build #129571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129571/testReport)** for PR 29959 at commit [`f582b17`](https://github.com/apache/spark/commit/f582b172766233f1554088b14e76ed5363ef7c19).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706453094


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712444101


   **[Test build #130012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130012/testReport)** for PR 29959 at commit [`d3f7f73`](https://github.com/apache/spark/commit/d3f7f7338c3c9af9810cdf7dbe6564d1a1adfdd5).
    * This patch **fails Spark unit tests**.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718876987






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710764579


   **[Test build #129942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129942/testReport)** for PR 29959 at commit [`3e63aac`](https://github.com/apache/spark/commit/3e63aac6f248ec890aa9e1e7b2b39e3cf4f5b306).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712509802






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710784507






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728323043


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35775/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706442480


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34208/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706356545






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706429983


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r508038023



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -66,6 +66,7 @@ private[deploy] object SparkSubmitAction extends Enumeration {
   val SUBMIT, KILL, REQUEST_STATUS, PRINT_VERSION = Value
 }
 
+

Review comment:
       ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r556432426



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -245,27 +208,18 @@ private[spark] object HadoopFSUtils extends Logging {
         Array.empty[FileStatus]
     }
 
-    def doFilter(statuses: Array[FileStatus]) = filterFun match {
-      case Some(shouldFilterOut) =>
-        statuses.filterNot(status => shouldFilterOut(status.getPath.getName))
-      case None =>
-        statuses
-    }
-
-    val filteredStatuses = doFilter(statuses)
     val allLeafStatuses = {
-      val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory)
+      val (dirs, topLevelFiles) = statuses.partition(_.isDirectory)

Review comment:
       @sunchao the `dirs` here may contain hidden directories. We still need to filter them before listing leaf files.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706453094






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704649312


   **[Test build #129484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129484/testReport)** for PR 29959 at commit [`8c8aa81`](https://github.com/apache/spark/commit/8c8aa81b40c2b00404793f7f63fd67db9f999bc9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729349274






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729939416


   K8s failures are unrelated, this does not change any of the decommissioning logic. I'll work on a follow up to the decommissioning failures.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706501505






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r501927461



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -57,11 +50,22 @@ private[spark] object HadoopFSUtils extends Logging {
    * @param parallelismMax The maximum parallelism for listing. If the number of input paths is
    *                       larger than this value, parallelism will be throttled to this value
    *                       to avoid generating too many tasks.
-   * @param filterFun Optional predicate on the leaf files. Files who failed the check will be
-   *                  excluded from the results
    * @return for each input path, the set of discovered files for the path
    */
   def parallelListLeafFiles(
+    sc: SparkContext,
+    paths: Seq[Path],
+    hadoopConf: Configuration,
+    filter: PathFilter,
+    ignoreMissingFiles: Boolean,
+    ignoreLocality: Boolean,
+    parallelismThreshold: Int,
+    parallelismMax: Int): Seq[(Path, Seq[FileStatus])] = {
+    parallelListLeafFilesInternal(sc, paths, hadoopConf, filter, true, ignoreMissingFiles,

Review comment:
       nit: For readability I'd passed this as a named parameter since a bare boolean isn't very clear.

##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       Is there a chance a FS won't have this implemented? as per the previous code's comment.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
##########
@@ -214,9 +214,9 @@ class FileIndexSuite extends SharedSparkSession {
               assert(leafFiles.isEmpty)
             } else {
               assert(raceCondition == classOf[FileDeletionRaceFileSystem])
-              // One of the two leaf files was missing, but we should still list the other:
-              assert(leafFiles.size == 1)
-              assert(leafFiles.head.getPath == nonDeletedLeafFilePath)
+              // listLocatedStatus will fail as a whole because the default impl calls
+              // getFileBlockLocations
+              assert(leafFiles.isEmpty)

Review comment:
       This seems to indicate the change needs some work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729324352


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35841/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705779006


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728295596


   Can you rebase this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706471137


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706484435


   Removed the changes on file listing (may deal with it in a separate PR). @holdenk mind take another look? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706501281


   **[Test build #129614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129614/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712466925






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706485127


   **[Test build #129614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129614/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712411283


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34619/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712465445


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706348232


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34195/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704628501






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705930551


   **[Test build #129571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129571/testReport)** for PR 29959 at commit [`f582b17`](https://github.com/apache/spark/commit/f582b172766233f1554088b14e76ed5363ef7c19).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704626267


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] steveloughran commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
steveloughran commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-707163511


   One thing to know here is that while listStatus etc is slow against object stores, it's not enough of a bandwidth killer than you get much speedup executing it across a spark cluster. Same process-many-threads: yes. Remote exec: no.
   
   I am going to make this class: org.apache.hadoop.mapred.LocatedFileStatusFetcher @Public/Evolving: in this PR: https://github.com/apache/hadoop/pull/2324
    it does do multithread scanning today and is fairly stable. 
   
   It's not the perfect API -I'd have a builder and actually return a remoteIterator of statuses which would block while new listing entries arrived, but it's there. With that PR will also collect aggregate stats on IO operations from any FS whose list calls provide them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r508038192



##########
File path: common/network-yarn/pom.xml
##########
@@ -74,6 +74,7 @@
     </dependency>
   </dependencies>
 
+

Review comment:
       Please ignore this - I'll revert this commit after the testing is done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704625670


   **[Test build #129479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129479/testReport)** for PR 29959 at commit [`6f7dc79`](https://github.com/apache/spark/commit/6f7dc7954fc686f84f584c20ad9cdc488afee051).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712446027


   **[Test build #130018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130018/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r502008562



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       yeah a FS can choose not to implement it (although all the main ones override this). If not implemented it will fall back to the default impl in `FileSystem`, which basically calls `listStatus` and then `getFileBlockLocations` on each `FileStatus` received. The behavior is very similar to what this class is doing later on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718832137


   **[Test build #130414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130414/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728294558


   Jenkins retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704664256


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34090/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728339734


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35776/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r502012000



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
##########
@@ -214,9 +214,9 @@ class FileIndexSuite extends SharedSparkSession {
               assert(leafFiles.isEmpty)
             } else {
               assert(raceCondition == classOf[FileDeletionRaceFileSystem])
-              // One of the two leaf files was missing, but we should still list the other:
-              assert(leafFiles.size == 1)
-              assert(leafFiles.head.getPath == nonDeletedLeafFilePath)
+              // listLocatedStatus will fail as a whole because the default impl calls
+              // getFileBlockLocations
+              assert(leafFiles.isEmpty)

Review comment:
       Yes this test checks the case where a file was deleted after a `listStatus` call but before a subsequent `getFileBlockLocations` when locality info is needed. With the new impl, we'd call `listLocatedStatus` instead which will call `getFileBlockLocations` internally, and thus the `listLocatedStatus` call (as a whole) fails with `FileNotFoundException`. 
   
   As explained in the PR description, the behavior will be different when `spark.sql.files.ignoreMissingFiles` is set, although I think we currently don't give any guarantee when there is missing files during listing, so either is acceptable? anyway, I'm happy to remove this change if there is any concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728338399






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704624305


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34085/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710784507






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706328087


   **[Test build #129592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129592/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718876961


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35018/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706471142


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129606/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704682274


   **[Test build #129484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129484/testReport)** for PR 29959 at commit [`8c8aa81`](https://github.com/apache/spark/commit/8c8aa81b40c2b00404793f7f63fd67db9f999bc9).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705819420


   **[Test build #129563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129563/testReport)** for PR 29959 at commit [`5e299a2`](https://github.com/apache/spark/commit/5e299a2e9e1944dc93bf5aad8c5c76125710cf3b).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712411311






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705820286


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129563/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705820278


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] steveloughran commented on a change in pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
steveloughran commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r503347661



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       HDFS and S3A both do this; ABFS merits minor optimisation too. Because they return a remote iterator they can do paged fetch of data
   * HDFS/webHDFS: paged download for better scalability
   * S3A (3.3.1+): async prefetch of next page of data
   
   ABFS should copy the S3A approach; it's listing API is paged too. 
   
   Better to rely on the FS to do the work *but make clear you expect the maintainers to do so*




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712377119


   **[Test build #130012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130012/testReport)** for PR 29959 at commit [`d3f7f73`](https://github.com/apache/spark/commit/d3f7f7338c3c9af9810cdf7dbe6564d1a1adfdd5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712474359


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706328087


   **[Test build #129592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129592/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728327420


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35776/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706391796


   **[Test build #129592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129592/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706491776


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34217/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706471137






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712422060


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34621/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705971167






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705779006






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710784234


   **[Test build #129942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129942/testReport)** for PR 29959 at commit [`3e63aac`](https://github.com/apache/spark/commit/3e63aac6f248ec890aa9e1e7b2b39e3cf4f5b306).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728376486






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728356803






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729302391


   **[Test build #131237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131237/testReport)** for PR 29959 at commit [`e9d399d`](https://github.com/apache/spark/commit/e9d399de621a9cbfc32438b3460f78bef0e73de9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728339770


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35776/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712474382






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r502007561



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -57,11 +50,22 @@ private[spark] object HadoopFSUtils extends Logging {
    * @param parallelismMax The maximum parallelism for listing. If the number of input paths is
    *                       larger than this value, parallelism will be throttled to this value
    *                       to avoid generating too many tasks.
-   * @param filterFun Optional predicate on the leaf files. Files who failed the check will be
-   *                  excluded from the results
    * @return for each input path, the set of discovered files for the path
    */
   def parallelListLeafFiles(
+    sc: SparkContext,
+    paths: Seq[Path],
+    hadoopConf: Configuration,
+    filter: PathFilter,
+    ignoreMissingFiles: Boolean,
+    ignoreLocality: Boolean,
+    parallelismThreshold: Int,
+    parallelismMax: Int): Seq[(Path, Seq[FileStatus])] = {
+    parallelListLeafFilesInternal(sc, paths, hadoopConf, filter, true, ignoreMissingFiles,

Review comment:
       yup will do

##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -207,18 +166,14 @@ private[spark] object HadoopFSUtils extends Logging {
     // Note that statuses only include FileStatus for the files and dirs directly under path,
     // and does not include anything else recursively.
     val statuses: Array[FileStatus] = try {
-      fs match {
-        // DistributedFileSystem overrides listLocatedStatus to make 1 single call to namenode
-        // to retrieve the file status with the file block location. The reason to still fallback
-        // to listStatus is because the default implementation would potentially throw a
-        // FileNotFoundException which is better handled by doing the lookups manually below.
-        case (_: DistributedFileSystem | _: ViewFileSystem) if !ignoreLocality =>
-          val remoteIter = fs.listLocatedStatus(path)
-          new Iterator[LocatedFileStatus]() {
-            def next(): LocatedFileStatus = remoteIter.next
-            def hasNext(): Boolean = remoteIter.hasNext
-          }.toArray
-        case _ => fs.listStatus(path)
+      if (ignoreLocality) {
+        fs.listStatus(path)
+      } else {
+        val remoteIter = fs.listLocatedStatus(path)

Review comment:
       yeah a FS can choose not to implement it (although all the main ones override this). If not implemented it will fall back to the default impl in `FileSystem`, which basically calls `listStatus` and then `getFileBlockLocations` on each `FileStatus` received. The behavior is very similar to what this class is doing later on.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
##########
@@ -214,9 +214,9 @@ class FileIndexSuite extends SharedSparkSession {
               assert(leafFiles.isEmpty)
             } else {
               assert(raceCondition == classOf[FileDeletionRaceFileSystem])
-              // One of the two leaf files was missing, but we should still list the other:
-              assert(leafFiles.size == 1)
-              assert(leafFiles.head.getPath == nonDeletedLeafFilePath)
+              // listLocatedStatus will fail as a whole because the default impl calls
+              // getFileBlockLocations
+              assert(leafFiles.isEmpty)

Review comment:
       Yes this test checks the case where a file was deleted after a `listStatus` call but before a subsequent `getFileBlockLocations` when locality info is needed. With the new impl, we'd call `listLocatedStatus` instead which will call `getFileBlockLocations` internally, and thus the `listLocatedStatus` call (as a whole) fails with `FileNotFoundException`. 
   
   As explained in the PR description, the behavior will be different when `spark.sql.files.ignoreMissingFiles` is set, although I think we currently don't give any guarantee when there is missing files during listing, so either is acceptable? anyway, I'm happy to remove this change if there is any concern. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-731291745


   found potential issue and opened #30447


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706436943


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34207/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705947958






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705743682


   **[Test build #129563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129563/testReport)** for PR 29959 at commit [`5e299a2`](https://github.com/apache/spark/commit/5e299a2e9e1944dc93bf5aad8c5c76125710cf3b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705779006






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-731284079


   Hi, @holdenk and @sunchao .
   
   Could you check Hadoop 2.7 failure?
   - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1609/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728298010


   **[Test build #131175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131175/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729302391


   **[Test build #131237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131237/testReport)** for PR 29959 at commit [`e9d399d`](https://github.com/apache/spark/commit/e9d399de621a9cbfc32438b3460f78bef0e73de9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728356886


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131175/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712509812






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706491780






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729324367


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706356545






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-705743682






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704626267






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706455638


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34209/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-720726820


   Thanks @holdenk for taking a look. Can you help to merge this when you have time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704664267






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728339763


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706491780






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710769195


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34547/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706433052


   **[Test build #129605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129605/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-710771585






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-729324367






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-731285760


   thanks @dongjoon-hyun , let me take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-704625670


   **[Test build #129479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129479/testReport)** for PR 29959 at commit [`6f7dc79`](https://github.com/apache/spark/commit/6f7dc7954fc686f84f584c20ad9cdc488afee051).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706447924


   **[Test build #129606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129606/testReport)** for PR 29959 at commit [`1b4bfbe`](https://github.com/apache/spark/commit/1b4bfbef579b76903d9fd4e6421b754b30f19e80).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706447924


   **[Test build #129606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129606/testReport)** for PR 29959 at commit [`1b4bfbe`](https://github.com/apache/spark/commit/1b4bfbef579b76903d9fd4e6421b754b30f19e80).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706452836


   **[Test build #129604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129604/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706445908


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34208/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728338399


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706460429


   **[Test build #129605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129605/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706458017






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728301817


   **[Test build #131176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131176/testReport)** for PR 29959 at commit [`cb76047`](https://github.com/apache/spark/commit/cb76047370a71deaa3b1a50e709ffe68a2b2a52a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-712411311






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29959:
URL: https://github.com/apache/spark/pull/29959#discussion_r502007561



##########
File path: core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala
##########
@@ -57,11 +50,22 @@ private[spark] object HadoopFSUtils extends Logging {
    * @param parallelismMax The maximum parallelism for listing. If the number of input paths is
    *                       larger than this value, parallelism will be throttled to this value
    *                       to avoid generating too many tasks.
-   * @param filterFun Optional predicate on the leaf files. Files who failed the check will be
-   *                  excluded from the results
    * @return for each input path, the set of discovered files for the path
    */
   def parallelListLeafFiles(
+    sc: SparkContext,
+    paths: Seq[Path],
+    hadoopConf: Configuration,
+    filter: PathFilter,
+    ignoreMissingFiles: Boolean,
+    ignoreLocality: Boolean,
+    parallelismThreshold: Int,
+    parallelismMax: Int): Seq[(Path, Seq[FileStatus])] = {
+    parallelListLeafFilesInternal(sc, paths, hadoopConf, filter, true, ignoreMissingFiles,

Review comment:
       yup will do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-728376495






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706445912






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706490123


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34217/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706419281


   **[Test build #129604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129604/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29959: [WIP][SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-706419281


   **[Test build #129604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129604/testReport)** for PR 29959 at commit [`e10e59f`](https://github.com/apache/spark/commit/e10e59f117dec07fc1839476fd5907558d781612).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29959: [SPARK-32381][CORE][SQL][FOLLOWUP] More cleanup on HadoopFSUtils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29959:
URL: https://github.com/apache/spark/pull/29959#issuecomment-718932008


   **[Test build #130414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130414/testReport)** for PR 29959 at commit [`be1517e`](https://github.com/apache/spark/commit/be1517e9a1cde0c6d69071b2a64d68e4e5fa6a1a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org