You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/01/05 04:39:41 UTC

[GitHub] [orc] dongjoon-hyun opened a new pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

dongjoon-hyun opened a new pull request #607:
URL: https://github.com/apache/orc/pull/607


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade `tools` module to use Hadoop 2.10.1.
   
   ### Why are the changes needed?
   
   This will make `orc-tools` CLI to use the latest Hadoop 2.x.
   
   ### How was this patch tested?
   
   Pass the CIs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-757556878


   I'm also curious about that, @pgaref .
   - ORC-603 is created by @omalley one year ago (2020 February), but the description is only having one sentence `We should upgrade the version of Hadoop used for the tools jar to the higher 2.7.7 from 2.7.3.` without any context. That's the reason why I pinged @omalley here.
   - BTW, the type of ORC-603 is BUG. So, @omalley seems to hit some kind of bugs at that time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-757525686


   Of course, we can use more latest one later too. The written scope of ORC-603 was 2.7.x, so I chose 2.10.1 here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555923333



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Do you think it's safe? Of course, we should bump `hadoop.version` later, but I don't think that is trivial like this, @pgaref .
   
   Hadoop 3 has many breaking changes like https://github.com/apache/orc/pull/608 is which was also a behavior change from Hadoop layer. This PR is designed by @omalley 's decision. He explicitly mentioned `tools` jar's Hadoop version, not the global Hadoop version.
   > We should upgrade the version of Hadoop used for the tools jar to the higher 2.7.7 from 2.7.3.
   
   I believe we can upgrade `tools` first and do the main later with more testing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-754390703


   cc @omalley 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758838881


   According to your advice, I updated the PR and add `mvn dependency:tree` result in the PR description, @pgaref .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-755076567


   Could you review this, @omalley ? Maybe, is this different from your intention described in ORC-603?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758880442


   Thanks much @dongjoon-hyun!  
   +1 on latest PR 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun edited a comment on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758087948


   I updated to use Apache Hadoop 3.2.2. It looks stabler than 3.3.0.
   > Otherwise we could also try bumping directly to hadoop 3 right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555932213



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Thanks for clarifying Dongjoon! 
   Since in this ticket we are targeting the specific issue mentioned by Owen I would vote for bumping to 2.10 which is more clean. In a follow-up we target Hadoop 3 project-wise. What do you think?
   Apologies for the back and forth :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555923333



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Do you think it's safe? Of course, we should bump `hadoop.version` later, but I don't think that is trivial like this, @pgaref .
   
   Hadoop 3 has many breaking changes like https://github.com/apache/orc/pull/608 is also a behavior change from Hadoop layer. This PR is designed by @omalley 's decision. He explicitly mentioned `tools` jar's Hadoop version, not the all Hadoop version.
   > We should upgrade the version of Hadoop used for the tools jar to the higher 2.7.7 from 2.7.3.
   
   I believe we can upgrade `tools` first and do the main later with more testing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758886655


   Thank you, @pgaref . It's merged to master for Apache ORC 1.7.0 first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555200164



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       For Apache Spark 2.10, we don't need to change this dependency.
   For Apache Spark 3.2.2, we need this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758087948


   I updated to use Apache Hadoop 3.2.2. It looks stabler than 3.3.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555932213



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Thanks for clarifying Dongjoon! 
   Since in this ticket we are targeting the specific issue mentioned by Owen I would vote for bumping to 2.10 which is more clean. In a follow-up we target Hadoop 3 project-wise. What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555968199



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       No problem. Thank YOU, @pgaref . What I wanted was your real feedback here. :) I'll move back to `2.10.x`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555677598



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Tests seem to pass for 3.2.2 -- I am not sure we should have an explicit hadoop version for tool though.
   Any reason we are not bumping hadoop.version project wide? We could still keep the min supported version to 2.2 right?
   
   ```
       <min.hadoop.version>2.2.0</min.hadoop.version>
       <hadoop.version>2.7.3</hadoop.version>
       <tools.hadoop.version>3.2.2</tools.hadoop.version>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #607:
URL: https://github.com/apache/orc/pull/607


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-757525481


   Could you review this, @omalley and @pgaref ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #607: ORC-603. Upgrade `tools` to use Hadoop 3.2.2

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #607:
URL: https://github.com/apache/orc/pull/607#discussion_r555923333



##########
File path: java/tools/pom.xml
##########
@@ -57,13 +57,13 @@
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <version>${tools.hadoop.version}</version>
       <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
+      <artifactId>hadoop-hdfs-client</artifactId>

Review comment:
       Do you think it's safe? Of course, we should bump `hadoop.version` later, but I don't think that is trivial like this, @pgaref .
   
   Hadoop 3 has many breaking changes like https://github.com/apache/orc/pull/608 is which was also a behavior change from Hadoop layer. This PR is designed by @omalley 's decision. He explicitly mentioned `tools` jar's Hadoop version, not the all Hadoop version.
   > We should upgrade the version of Hadoop used for the tools jar to the higher 2.7.7 from 2.7.3.
   
   I believe we can upgrade `tools` first and do the main later with more testing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-758077333


   For Hadoop 3, we have two options.
   - Apache Hadoop `3.2.2` is released a few days ago. (2021 Jan 9)
   - Apache Hadoop `3.3.0` was released last year (2020 Jul 14)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #607: ORC-603. Upgrade `tools` to use Hadoop 2.10.1

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #607:
URL: https://github.com/apache/orc/pull/607#issuecomment-757548260


   Hey @dongjoon-hyun  -- bumping to Hadoop 2.10 looks fine to me but I am kinda missing context here.
   What was the main reason for bumping here? Are we after a bug fix? a feature?
   Otherwise we could also try bumping directly to hadoop 3 right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org