You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by zh...@apache.org on 2023/09/25 11:52:17 UTC

[incubator-celeborn] branch main updated: [CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build

This is an automated email from the ASF dual-hosted git repository.

zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new f8091411e [CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build
f8091411e is described below

commit f8091411e54b37f49fcbf7b32bfd5ea7298f2a24
Author: Mridul Muralidharan <mridulatgmail.com>
AuthorDate: Mon Sep 25 19:52:09 2023 +0800

    [CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build
    
    ### What changes were proposed in this pull request?
    
    Add support for Apache Hadoop 2.x in Celeborn build
    Developers need to only specify their `hadoop.version`, and the build will pick the right profile internally based on the version to add the relevant dependencies.
    
    ### Why are the changes needed?
    
    [hadoop-client-api](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api) and [hadoop-client-runtime](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime) were introduced in hadoop 3.x, while hadoop 2.x had [hadoop-client](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client)
    Celeborn depends on the former, and so requires hadoop 3.x to build.
    
    Apache Spark dropped support for Hadoop 2.x only in the recent v3.5 ([SPARK-42452](https://issues.apache.org/jira/browse/SPARK-42452)). Given this, we have case where deployments on supported platforms like Spark 3.4 and older running on 2.x hadoop, will need to pull in hadoop 3.x just for Celeborn.
    
    This PR uses `hadoop-client` when `hadoop.version` is specified as 2.x - and preserves existing behavior when `hadoop.version` is 3.x
    
    Note - while using `hadoop-client` in 3.x is an option, hadoop community recommendation is to rely on `hadoop-client-api`/`hadoop-client-runtime`, hence making an effort to leverage that as much as possible.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Adds support for using 2.x for hadoop.version
    
    ### How was this patch tested?
    
    Three combinations were tested:
    
    * Default, without overriding hadoop.version
    
    Dependencies:
    ```
    $ build/mvn dependency:list 2>&1 | grep hadoop | sort | uniq
    [INFO]    org.apache.hadoop:hadoop-client-api:jar:3.2.4:compile
    [INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.2.4:compile
    ```
    
    Will update this section again based on test suite results (which are ongoing)
    
    * Setting hadoop.version to newer 3.3.0 explicitly
    
    Dependencies:
    ```
    $ ARGS="-Pspark-3.1 -Dhadoop.version=3.3.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | sort | uniq
    [INFO]    org.apache.hadoop:hadoop-client-api:jar:3.3.0:compile
    [INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.3.0:compile
    ```
    
    * Setting hadoop.version to older 2.10.0
    
    Dependencies:
    ```
    $ ARGS="-Pspark-3.1 -Dhadoop.version=2.10.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | grep compile | sort | uniq
    [INFO]    org.apache.hadoop:hadoop-auth:jar:2.10.0:compile -- module hadoop.auth (auto)
    [INFO]    org.apache.hadoop:hadoop-client:jar:2.10.0:compile -- module hadoop.client (auto)
    [INFO]    org.apache.hadoop:hadoop-common:jar:2.10.0:compile -- module hadoop.common (auto)
    [INFO]    org.apache.hadoop:hadoop-hdfs-client:jar:2.10.0:compile -- module hadoop.hdfs.client (auto)
    [INFO]    org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.10.0:compile -- module hadoop.mapreduce.client.app (auto)
    [INFO]    org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.10.0:compile -- module hadoop.mapreduce.client.common (auto)
    [INFO]    org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.10.0:compile -- module hadoop.mapreduce.client.core (auto)
    [INFO]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.10.0:compile
    [INFO]    org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.10.0:compile -- module hadoop.mapreduce.client.shuffle (auto)
    [INFO]    org.apache.hadoop:hadoop-yarn-api:jar:2.10.0:compile -- module hadoop.yarn.api (auto)
    [INFO]    org.apache.hadoop:hadoop-yarn-common:jar:2.10.0:compile -- module hadoop.yarn.common (auto)
    ```
    
    For each of the case above, build/test passes for each of the `ARGS`.
    
    Closes #1936 from mridulm/main.
    
    Authored-by: Mridul Muralidharan <mridulatgmail.com>
    Signed-off-by: zky.zhoukeyong <zk...@alibaba-inc.com>
---
 .mvn/extensions.xml  | 25 ++++++++++++++++++
 LICENSE-binary       | 11 ++++++++
 client-mr/mr/pom.xml | 48 +++++++++++++++++++++++++++--------
 common/pom.xml       | 48 +++++++++++++++++++++++++++--------
 master/pom.xml       | 47 ++++++++++++++++++++++++++--------
 pom.xml              | 71 ++++++++++++++++++++++++++++++++++++++++++++--------
 6 files changed, 209 insertions(+), 41 deletions(-)

diff --git a/.mvn/extensions.xml b/.mvn/extensions.xml
new file mode 100644
index 000000000..3bfbd1dc2
--- /dev/null
+++ b/.mvn/extensions.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+<extensions xmlns="http://maven.apache.org/EXTENSIONS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+            xsi:schemaLocation="http://maven.apache.org/EXTENSIONS/1.0.0 http://maven.apache.org/xsd/core-extensions-1.0.0.xsd">
+    <extension>
+        <groupId>fish.payara.maven.extensions</groupId>
+        <artifactId>regex-profile-activator</artifactId>
+        <version>0.5</version>
+    </extension>
+</extensions>
diff --git a/LICENSE-binary b/LICENSE-binary
index 6971f6b64..7be2901c1 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -243,8 +243,19 @@ io.netty:netty-transport-sctp
 io.netty:netty-transport-udt
 org.apache.commons:commons-crypto
 org.apache.commons:commons-lang3
+org.apache.hadoop:hadoop-auth
+org.apache.hadoop:hadoop-client
 org.apache.hadoop:hadoop-client-api
 org.apache.hadoop:hadoop-client-runtime
+org.apache.hadoop:hadoop-common
+org.apache.hadoop:hadoop-hdfs-client
+org.apache.hadoop:hadoop-mapreduce-client-app
+org.apache.hadoop:hadoop-mapreduce-client-common
+org.apache.hadoop:hadoop-mapreduce-client-core
+org.apache.hadoop:hadoop-mapreduce-client-jobclient
+org.apache.hadoop:hadoop-mapreduce-client-shuffle
+org.apache.hadoop:hadoop-yarn-api
+org.apache.hadoop:hadoop-yarn-common
 org.apache.htrace:htrace-core4
 org.apache.logging.log4j:log4j-1.2-api
 org.apache.logging.log4j:log4j-api
diff --git a/client-mr/mr/pom.xml b/client-mr/mr/pom.xml
index 402ffa507..8b16614cb 100644
--- a/client-mr/mr/pom.xml
+++ b/client-mr/mr/pom.xml
@@ -39,20 +39,48 @@
       <artifactId>celeborn-client_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
     </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-api</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-runtime</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
     <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-mapreduce-client-app</artifactId>
       <version>${hadoop.version}</version>
     </dependency>
   </dependencies>
+
+  <profiles>
+    <profile>
+      <id>hadoop-3</id>
+      <activation>
+        <property>
+          <name>hadoop-3-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-api</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-runtime</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+    <profile>
+      <id>hadoop-2</id>
+      <activation>
+        <property>
+          <name>hadoop-2-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+  </profiles>
 </project>
diff --git a/common/pom.xml b/common/pom.xml
index b88552ac6..7af0e9b9e 100644
--- a/common/pom.xml
+++ b/common/pom.xml
@@ -107,17 +107,6 @@
       <artifactId>scala-reflect</artifactId>
     </dependency>
 
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-api</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-runtime</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-
     <dependency>
       <groupId>org.roaringbitmap</groupId>
       <artifactId>RoaringBitmap</artifactId>
@@ -156,4 +145,41 @@
     </extensions>
   </build>
 
+  <profiles>
+    <profile>
+      <id>hadoop-3</id>
+      <activation>
+        <property>
+          <name>hadoop-3-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-api</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-runtime</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+    <profile>
+      <id>hadoop-2</id>
+      <activation>
+        <property>
+          <name>hadoop-2-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+  </profiles>
 </project>
diff --git a/master/pom.xml b/master/pom.xml
index 8facb5585..a5260a1ce 100644
--- a/master/pom.xml
+++ b/master/pom.xml
@@ -82,16 +82,6 @@
       <groupId>org.apache.logging.log4j</groupId>
       <artifactId>log4j-1.2-api</artifactId>
     </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-api</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client-runtime</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
 
     <dependency>
       <groupId>org.mockito</groupId>
@@ -119,4 +109,41 @@
       </extension>
     </extensions>
   </build>
+  <profiles>
+    <profile>
+      <id>hadoop-3</id>
+      <activation>
+        <property>
+          <name>hadoop-3-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-api</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-runtime</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+    <profile>
+      <id>hadoop-2</id>
+      <activation>
+        <property>
+          <name>hadoop-2-deps</name>
+        </property>
+      </activation>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+  </profiles>
 </project>
diff --git a/pom.xml b/pom.xml
index 287376cf9..31efbcf12 100644
--- a/pom.xml
+++ b/pom.xml
@@ -380,16 +380,6 @@
         <artifactId>snakeyaml</artifactId>
         <version>${snakeyaml.version}</version>
       </dependency>
-      <dependency>
-        <groupId>org.apache.hadoop</groupId>
-        <artifactId>hadoop-client-api</artifactId>
-        <version>${hadoop.version}</version>
-      </dependency>
-      <dependency>
-        <groupId>org.apache.hadoop</groupId>
-        <artifactId>hadoop-client-runtime</artifactId>
-        <version>${hadoop.version}</version>
-      </dependency>
       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapreduce-client-app</artifactId>
@@ -921,6 +911,67 @@
   </build>
 
   <profiles>
+    <profile>
+      <id>hadoop-3</id>
+      <activation>
+        <property>
+          <name>hadoop.version</name>
+          <value>/^3\..*$/</value>
+        </property>
+      </activation>
+      <properties>
+        <hadoop-3-deps>true</hadoop-3-deps>
+      </properties>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-api</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client-runtime</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+    <profile>
+      <id>hadoop-2</id>
+      <activation>
+        <property>
+          <name>hadoop.version</name>
+          <value>/^2\..*$/</value>
+        </property>
+      </activation>
+      <properties>
+        <hadoop-2-deps>true</hadoop-2-deps>
+      </properties>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+          <version>${hadoop.version}</version>
+          <exclusions>
+            <exclusion>
+              <groupId>org.apache.hadoop</groupId>
+              <artifactId>hadoop-annotations</artifactId>
+            </exclusion>
+            <exclusion>
+              <groupId>org.apache.hadoop</groupId>
+              <artifactId>hadoop-yarn-client</artifactId>
+            </exclusion>
+            <exclusion>
+              <groupId>org.apache.hadoop</groupId>
+              <artifactId>hadoop-yarn-registry</artifactId>
+            </exclusion>
+            <exclusion>
+              <groupId>org.apache.hadoop</groupId>
+              <artifactId>hadoop-yarn-server-common</artifactId>
+            </exclusion>
+          </exclusions>
+        </dependency>
+      </dependencies>
+    </profile>
     <profile>
       <id>spark-2.4</id>
       <modules>