You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/22 22:14:42 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

dongjoon-hyun opened a new pull request #28897:
URL: https://github.com/apache/spark/pull/28897


   ### What changes were proposed in this pull request?
   
   This PR aims to switch the default Apache Hadoop dependency from 2.7.4 to 3.2.0 by default.
   
   ### Why are the changes needed?
   
   Apache Hadoop 3.2 has many fixes and new cloud-friendly features.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Since the default Hadoop dependency changes, the users will get a better support in a cloud environment.
   
   ### How was this patch tested?
   
   Pass the Jenkins.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650403512


   +1 from me. Users still have the option to use Hadoop 2.7, so I feel it's safe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647875507


   Hi, @srowen , @HyukjinKwon , @cloud-fan , @gatorsmile .
   Could you review this please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   I'm wondering if I miss any things in the mailing thread. It would be great if you can answer my question, too. Do you have a specific issue? Could you share it with the community? If possible, on the dev mailing list? Then, we can try to fix it together in order to move forward.
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PySpark` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650418721


   LGTM, we can continue the PyPI discussion separately.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647853176






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647851310






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649868767


   Hi, @srowen , @HyukjinKwon , @gatorsmile , @holdenk , @dbtsai .
   According to your comments and advices, I updated the PR description clearly and focused on only Apache-side. Can we make Apache Spark 3.1 move forward? Thank you in advance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649865700


   **[Test build #124523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times last week, but we provides them only Spark with `Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650428500


   Thank you so much, @holdenk ! Yes, we can discuss and improve it separately later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   I'm wondering if I miss any things in the mailing thread. It would be great if you can answer my question, too. Do you have a specific issue? Could you share it with the community? If possible, on the dev mailing list? Then, we can try to fix it together in order to move forward.
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PyPi` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark can be be the same with Spark 3.0.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667


   Yes. As you said, the default version is very important for PySpark users. 
   
   We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647952007


   We should avoid forcing the current PySpark users to upgrade their Hadoop versions. If we change the default, will it impact them? If YES, I think we should not do it until it is ready and they have a workaround. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649865700


   **[Test build #124523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647904179


   For that, I'm think about new features like the followings, but the required items varies based on the users situation.
   - [HADOOP-13786](https://issues.apache.org/jira/browse/HADOOP-13786) Add S3A committers for zero-rename commits to S3 endpoints
   - [HADOOP-13075](https://issues.apache.org/jira/browse/HADOOP-13075) Add support for SSE-KMS and SSE-C in s3a filesystem
   - [HADOOP-13578](https://issues.apache.org/jira/browse/HADOOP-13578) Add Codec for ZStandard Compression (This is not cloud-specific)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647904179


   For that, I'm think about new features like the followings, but the required items varies based on the users situation.
   - [HADOOP-13786](https://issues.apache.org/jira/browse/HADOOP-13786) Add S3A committers for zero-rename commits to S3 endpoints
   - [HADOOP-13075](https://issues.apache.org/jira/browse/HADOOP-13075) Add support for SSE-KMS and SSE-C in s3a filesystem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   Did I miss any things in the mailing thread? It would be great if you answer my question, too. Do you have a specific issue?
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PySpark` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647942729


   ^ I target to have a way to control it in Spark 3.1 FWIW at SPARK-32017


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647798387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906173


   **[Test build #124523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124523/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647797755


   **[Test build #124371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124371/testReport)** for PR 28897 at commit [`9663de5`](https://github.com/apache/spark/commit/9663de5370400bfb1efc0e821de2d6298901a3fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   I'm wondering if I miss any things in the mailing thread. It would be great if you can answer my question, too. Do you have a specific issue? Could you share it with the community? If possible, on the dev mailing list? Then, we can try to fix it together in order to move forward.
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PyPi` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647949881


   BTW, if you want to have `Hadoop 2.7` variant in `Hadoop 3.2 (default)` environment, we had better revise the JIRA issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647949279


   @gatorsmile . Why that blocks this? Technically, this supersedes it, doesn't it?
   > We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017
   
   Switching the default is the real one. For example, we released Scala 2.12 in Spark 2.4.x lines for a while, but we didn't notice the Scala function issue until 3.0.0 release. 
   
   Also, we can switch back to `Hadoop 2.7` before December if we want.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649864673


   Retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647878276


   > the users will get a better support in a cloud environment.
   
   Can you explain the details? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648611417


   **[Test build #124445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124445/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648542765


   **[Test build #124445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124445/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times last week, but it's only Spark distribution with `Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648612572






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649906877






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #28897:
URL: https://github.com/apache/spark/pull/28897


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complain about Hadoop 2.7.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   Did I miss any things in the mailing thread? It would be great if you answer my question, too. Do you have a specific issue?
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PySpark` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647904179


   For that, I'm think about new features like the followings, but those item varies based on the users situation.
   - [HADOOP-13786](https://issues.apache.org/jira/browse/HADOOP-13786) Add S3A committers for zero-rename commits to S3 endpoints
   - [HADOOP-13075](https://issues.apache.org/jira/browse/HADOOP-13075) Add support for SSE-KMS and SSE-C in s3a filesystem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648541201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648612572


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647955466


   Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR? For example, some incompatibility issues introduced by Hadoop 3.x. 
   
   This PR did not answer this important question in the PR description. We need to answer this before doing any further action. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648541201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647802479


   **[Test build #124372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124372/testReport)** for PR 28897 at commit [`caf50d1`](https://github.com/apache/spark/commit/caf50d12e41318dba32a1340793ee98379009f8c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647912232


   BTW, please note that the default version is very important. For example, PySpark is downloaded 1,333,883 times, but we provides them only Spark with `Hadoop 2.7.4`.
   - https://pypistats.org/packages/pyspark


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. PyPi uploading is a manual process. Then, we can keep PySpark with Hadoop 2.7 in PyPi. Is that enough?
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen many complain about Hadoop 2.7.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR?
   
   Did I miss any things in the mailing thread? It would be great if you answer my question, too. Do you have a specific issue?
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PySpark` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647798387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647953126


   I'm wondering what impact are you worrying specifically, @gatorsmile ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647802922






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650476825


   Thank you all. Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
gatorsmile edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647946667


   Yes. As you said, the default version is very important for PySpark users. I am afraid there are breaking changes in Hadoop 3.x releases. 
   
   We should avoid making this change until we can resolve https://issues.apache.org/jira/browse/SPARK-32017


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647850610


   **[Test build #124371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124371/testReport)** for PR 28897 at commit [`9663de5`](https://github.com/apache/spark/commit/9663de5370400bfb1efc0e821de2d6298901a3fc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649866080






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647802922






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647852600


   **[Test build #124372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124372/testReport)** for PR 28897 at commit [`caf50d1`](https://github.com/apache/spark/commit/caf50d12e41318dba32a1340793ee98379009f8c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648612580


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124445/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647851310






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647904179


   For that, I'm think about new features like the followings, but the required items varies based on the users situation.
   - [HADOOP-13786](https://issues.apache.org/jira/browse/HADOOP-13786) Add S3A committers for zero-rename commits to S3 endpoints
   - [HADOOP-13075](https://issues.apache.org/jira/browse/HADOOP-13075) Add support for SSE-KMS and SSE-C in s3a filesystem
   - [HADOOP-13578](https://issues.apache.org/jira/browse/HADOOP-13578) Add Codec for ZStandard Compression


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650408514


   Thank you so much, @dbtsai !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648542765


   **[Test build #124445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124445/testReport)** for PR 28897 at commit [`2434365`](https://github.com/apache/spark/commit/243436582164fedd04b28f450578587743df657a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-650401065


   Gentle ping once again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-649866080






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647797755


   **[Test build #124371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124371/testReport)** for PR 28897 at commit [`9663de5`](https://github.com/apache/spark/commit/9663de5370400bfb1efc0e821de2d6298901a3fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647802479


   **[Test build #124372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124372/testReport)** for PR 28897 at commit [`caf50d1`](https://github.com/apache/spark/commit/caf50d12e41318dba32a1340793ee98379009f8c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-647853176






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org