You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@linkis.apache.org by GitBox <gi...@apache.org> on 2023/01/11 15:53:21 UTC

[GitHub] [linkis] GuoPhilipse opened a new pull request, #4110: [Feat]support different hadoop version compile

GuoPhilipse opened a new pull request, #4110:
URL: https://github.com/apache/linkis/pull/4110

   
   ### What is the purpose of the change
   1、some dependency was redundant, we may optimize it.
   2、support compile with different hadoop versio ,user no need to modify source code to get linkis package with hadoop3 or hadoop2 
   
   ### Related issues/PRs
   
   Related issues: #4109 
   Related pr:#4110
   
   
   ### Brief change log
   
   1、optimize hadoop depency
   2、specify profile to support hadoop2 and hadoop3 
   3、keep default hadoop version2.7.2 not changed
   
   
   ### Checklist
   
   - [x] I have read the [Contributing Guidelines on pull requests](https://github.com/facebook/docusaurus/blob/main/CONTRIBUTING.md#pull-requests).
   - [ ] I have explained the need for this PR and the problem it solves
   - [ ] I have explained the changes or the new features added to this PR
   - [ ] I have added tests corresponding to this change
   - [ ] I have updated the documentation to reflect this change
   - [ ] I have verified that this change is backward compatible (If not, please discuss on the [Linkis mailing list](https://linkis.apache.org/community/how-to-subscribe) first)
   - [ ] **If this is a code change**: I have written unit tests to fully verify the new behavior.
   
   
   
   <!--
   
   Note
   
   1. Mark the PR title as `[WIP] title` until it's ready to be reviewed.
      如果PR还未准备好被review,请在标题上添加[WIP]标识(WIP work in progress)
   
   2. Always add/update tests for any changes unless you have a good reason.
      除非您有充分的理由,否则任何修改都需要添加/更新测试
      
   3. Always update the documentation to reflect the changes made in the PR.
      始终更新文档以反映 PR 中所做的更改  
      
   4. After the PR is submitted, please pay attention to the execution result of git action check. 
      If there is any failure, please adjust it in time
      PR提交后,请关注git action check 执行结果,关键的check失败时,请及时修正
      
   5. Before the pr is merged, if the commit is missing, you can continue to commit the code
       在未合并前,如果提交有遗漏,您可以继续提交代码 
   
   6. After you submit PR, you can add assistant WeChat, the WeChat QR code is 
      https://user-images.githubusercontent.com/7869972/176336986-d6b9be8f-d1d3-45f1-aa45-8e6adf5dd244.png 
      您提交pr后,可以添加助手微信,微信二维码为
      https://user-images.githubusercontent.com/7869972/176336986-d6b9be8f-d1d3-45f1-aa45-8e6adf5dd244.png
   
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092643449


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   > i have also tried to use relocation, but found spark have some force dependency for some hadoop-common class(shade class type will be rejected), @jackxu2011 do you have better ideas ?
   > 
   > ```
   >   def hadoopFile[K, V](path : scala.Predef.String, inputFormatClass : scala.Predef.Class[_ <: org.apache.hadoop.mapred.InputFormat[K, V]], keyClass : scala.Predef.Class[K], valueClass : scala.Predef.Class[V], minPartitions : scala.Int = { /* compiled code */ }) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
   > ```
   > 
   > ```
   >   def transfer(sc: SparkContext, path: String, encoding: String): RDD[String] = {
   >     sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], 1)
   >       .map(p => new String(p._2.getBytes, 0, p._2.getLength, encoding))
   >   }
   > ```
   
   if you rebuild the spark2.4 with hadoop3,  in my opinion,  the shade is useless? because the spark2.4 can directed run with hadoop3. just use the variable `hadoop-hdfs-client.artifact` can resolve the problem. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092643449


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   > i have also tried to use relocation, but found spark have some force dependency for some hadoop-common class(shade class type will be rejected), @jackxu2011 do you have better ideas ?
   > 
   > ```
   >   def hadoopFile[K, V](path : scala.Predef.String, inputFormatClass : scala.Predef.Class[_ <: org.apache.hadoop.mapred.InputFormat[K, V]], keyClass : scala.Predef.Class[K], valueClass : scala.Predef.Class[V], minPartitions : scala.Int = { /* compiled code */ }) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
   > ```
   > 
   > ```
   >   def transfer(sc: SparkContext, path: String, encoding: String): RDD[String] = {
   >     sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], 1)
   >       .map(p => new String(p._2.getBytes, 0, p._2.getLength, encoding))
   >   }
   > ```
   
   if you rebuild the spark2.4 with hadoop3,  in my opinion,  the shade is useless? because the spark2.4 can directed run with hadoop3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1093288779


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -418,6 +454,18 @@
         </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-common</artifactId>
+      <version>${hadoop-hdfs-client-shade.version}</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-hdfs-client-shade.artifact}</artifactId>
+      <version>${hadoop-hdfs-client-shade.version}</version>

Review Comment:
   compile will fail if no version specific, ${hadoop.version} seems more reasonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1383063370

   I am going to shade the hadoop-client separately, so it intends to solve two hadoop-client version in spark or hive engine classpath and to solve lower hive version(spark-hive force depencency) conflict with high hadoop version. 
   do you got better ideas for that ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1070213310


##########
linkis-engineconn-plugins/hive/pom.xml:
##########
@@ -50,6 +50,12 @@
       <artifactId>linkis-storage</artifactId>
       <version>${project.version}</version>
       <scope>provided</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   yes,indeed,if we need compile with hadoop3 ,the common modules will contains hadoop-clients3 for sure. maybe we need to make hive module and spark module fit for hadoop3, instand of keep old hdfs-client version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092640648


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   
   
   
   
   > I have tried ,seems not as expectd. for now i have excluded hdfs from spark-hive、spark-core. and so on , if the `linkis-hadoop-hdfs-client-shade ` dependency moved to `spark-2.4-hadoop-3.3` profile, when we donot compile with hadoop-3.3 and spark-2.4-hadoop-3.3, the spark module will be lack of hdfs dependency, if we add linkis-hadoop-common to the spark module, the compile will be failed for `Unrecognized Hadoop major version numb`. if we do not exclude hdfs from the spark-core, spark-hive and so on. when we compile with hadoop-3.3 and spark-2.4-hadoop-3.3, the spark module output lib will have hadoop3 dependency,backing to our previous version problem.
   
   linkis-hadoop-common  will in the classpath when run spark ec, because the linkis-hadoop-common is in lib/linkis-common/ public-module



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1070190184


##########
linkis-engineconn-plugins/hive/pom.xml:
##########
@@ -50,6 +50,12 @@
       <artifactId>linkis-storage</artifactId>
       <version>${project.version}</version>
       <scope>provided</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   this just useful for compile.
   i mean,  when start the ec,  the ec will include the classpath :  lib/linkis-commons/public-module,   which also have hadoop client.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1070506339


##########
tool/dependencies/known-dependencies.txt:
##########
@@ -70,6 +70,7 @@ commons-io-2.11.0.jar
 commons-jxpath-1.3.jar
 commons-lang-2.6.jar
 commons-lang3-3.12.0.jar
+commons-logging-1.1.3.jar
 commons-logging-1.2.jar

Review Comment:
   ohh... got it. maybe i can exclude this dependency when import.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1094300563


##########
pom.xml:
##########
@@ -111,6 +112,9 @@
     <revision>1.3.2-SNAPSHOT</revision>
     <jedis.version>2.9.2</jedis.version>
     <hadoop.version>2.7.2</hadoop.version>
+    <hadoop-hdfs-client-shade.artifact>hadoop-hdfs</hadoop-hdfs-client-shade.artifact>

Review Comment:
   ok, we may first leave it out ,if we support spark3 needed , we will add later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong commented on pull request #4110: [Feat]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
peacewong commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1379783208

   @jackxu2011 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1068980942


##########
pom.xml:
##########
@@ -1349,6 +1248,20 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop3  -->
+    <profile>
+      <id>hadoop3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>

Review Comment:
   make sense , will update later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1382647781

   > I found, the spark2.4.3 just have hadoop 3.1 profile. is it support hadoop 3.3?
   
   we now use spark 2.4.5, and spark branch-2.4 only support hadoop 2.6,2.7 and 3.1 profile ,if user need other hadoop versions, they need to recompile spark code .
   we now use spark2.4.5 and hdfs client with 2.7.7 and 3.3.1(on yarn) and hdfs cluster with 3.3.1 ,it works well
   
   <img width="468" alt="image" src="https://user-images.githubusercontent.com/46367746/212448097-47aaca13-dc46-40b5-9921-24d9c0998623.png">
   <img width="974" alt="image" src="https://user-images.githubusercontent.com/46367746/212448371-38047c68-3911-41f3-af83-29318733a1d4.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1067625966


##########
tool/dependencies/known-dependencies.txt:
##########
@@ -70,6 +70,7 @@ commons-io-2.11.0.jar
 commons-jxpath-1.3.jar
 commons-lang-2.6.jar
 commons-lang3-3.12.0.jar
+commons-logging-1.1.3.jar
 commons-logging-1.2.jar

Review Comment:
   it's better to add the commons-logging-1.2.jar to the dependencyManagement in the parent pom.xml



##########
pom.xml:
##########
@@ -1349,6 +1248,20 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop3  -->
+    <profile>
+      <id>hadoop3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>
+      </properties>
+    </profile>
+    <!-- hadoop version: mvn validate -Phadoop2  -->
+    <profile>
+      <id>hadoop2</id>

Review Comment:
   is the name hadoop-2.7 is better



##########
pom.xml:
##########
@@ -1349,6 +1248,20 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop3  -->
+    <profile>
+      <id>hadoop3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>

Review Comment:
   change the  curator.version in hadoop3?
   <curator.version>4.2.0</curator.version>



##########
pom.xml:
##########
@@ -1349,6 +1248,20 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop3  -->
+    <profile>
+      <id>hadoop3</id>

Review Comment:
   is the hadoop-3.3 is better



##########
linkis-engineconn-plugins/hive/pom.xml:
##########
@@ -28,6 +28,7 @@
 
   <properties>
     <hive.version>2.3.3</hive.version>
+    <hadoop.version>2.7.2</hadoop.version>

Review Comment:
   this will have multi hadoop client version, when hive engineConn start



##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -27,6 +27,7 @@
   <artifactId>linkis-engineplugin-spark</artifactId>
   <properties>
     <spark.version>2.4.3</spark.version>
+    <hadoop.version>2.7.2</hadoop.version>

Review Comment:
   this will have multi hadoop client when the spark ec start.
   the ec will include the linkis-commons,  with include the hadoop client



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1383102210

   > > I am going to shade the hadoop-client separately, so it intends to solve two hadoop-client version in spark or hive engine classpath and to solve lower hive version(spark-hive force depencency) conflict with high hadoop version. do you got better ideas for that ?
   > 
   > i think, when use hadoop 3.3 should update the spark, and hive version, which direct support hadoop 3.3.
   
   seems this way is the easiest way to support different hadoop version.
    I am wondering if some user like me 
   `hdfs cluster version is 3.3.1 `
   `spark version is 2.4.3`
   `yarn client is 2.7.7`
    they may be rejected by linkis until they client finished upgrade. @peacewong how do you think /


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092643449


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   > i have also tried to use relocation, but found spark have some force dependency for some hadoop-common class(shade class type will be rejected), @jackxu2011 do you have better ideas ?
   > 
   > ```
   >   def hadoopFile[K, V](path : scala.Predef.String, inputFormatClass : scala.Predef.Class[_ <: org.apache.hadoop.mapred.InputFormat[K, V]], keyClass : scala.Predef.Class[K], valueClass : scala.Predef.Class[V], minPartitions : scala.Int = { /* compiled code */ }) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
   > ```
   > 
   > ```
   >   def transfer(sc: SparkContext, path: String, encoding: String): RDD[String] = {
   >     sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], 1)
   >       .map(p => new String(p._2.getBytes, 0, p._2.getLength, encoding))
   >   }
   > ```
   
   if you rebuild the spark2.4 with hadoop3,  in my opinion,  the shade is useless? because the spark2.4 can directed run with hadoop3. just using the variable `hadoop-hdfs-client.artifact` can resolve the problem. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1094165565


##########
pom.xml:
##########
@@ -111,6 +112,9 @@
     <revision>1.3.2-SNAPSHOT</revision>
     <jedis.version>2.9.2</jedis.version>
     <hadoop.version>2.7.2</hadoop.version>
+    <hadoop-hdfs-client-shade.artifact>hadoop-hdfs</hadoop-hdfs-client-shade.artifact>

Review Comment:
   @GuoPhilipse hadoop-hdfs-client-shade.artifact 这个变量可以去掉,写`hadoop-hdfs`就可以了



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1091635127


##########
pom.xml:
##########
@@ -1349,6 +1359,45 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop-3.3 ,spark still use hadoop2.7.2 by default, More details please check SPARK-23534   -->
+    <profile>
+      <id>hadoop-3.3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>
+        <curator.version>4.2.0</curator.version>
+        <hadoop-hdfs-client.artifact>hadoop-hdfs-client</hadoop-hdfs-client.artifact>
+      </properties>
+    </profile>
+    <!-- hadoop version: mvn validate -Phadoop-2.7  -->
+    <profile>
+      <id>hadoop-2.7</id>
+      <properties>
+        <hadoop.version>2.7.2</hadoop.version>
+        <curator.version>2.7.1</curator.version>
+      </properties>
+    </profile>
+    <!-- spark2-hadoop3 version:spark2.4 shade use hadoop2.7.2 by default mvn validate -Pspark-2.4-hadoop-3.3  -->
+    <profile>
+      <id>spark-2.4-hadoop-3.3</id>

Review Comment:
   this profile should be moved to spark's pom.xml,  and this profile should work with -Phadoop-3.3.   and put the linkis-hadoop-hdfs-client-shade dependency to this profile.



##########
pom.xml:
##########
@@ -1349,6 +1359,45 @@
   </build>
 
   <profiles>
+    <!-- hadoop version: mvn validate -Phadoop-3.3 ,spark still use hadoop2.7.2 by default, More details please check SPARK-23534   -->
+    <profile>
+      <id>hadoop-3.3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>
+        <curator.version>4.2.0</curator.version>
+        <hadoop-hdfs-client.artifact>hadoop-hdfs-client</hadoop-hdfs-client.artifact>
+      </properties>
+    </profile>
+    <!-- hadoop version: mvn validate -Phadoop-2.7  -->
+    <profile>
+      <id>hadoop-2.7</id>
+      <properties>
+        <hadoop.version>2.7.2</hadoop.version>
+        <curator.version>2.7.1</curator.version>
+      </properties>
+    </profile>
+    <!-- spark2-hadoop3 version:spark2.4 shade use hadoop2.7.2 by default mvn validate -Pspark-2.4-hadoop-3.3  -->
+    <profile>
+      <id>spark-2.4-hadoop-3.3</id>
+      <properties>
+        <hadoop.version>3.3.1</hadoop.version>
+        <curator.version>4.2.0</curator.version>
+        <hadoop-hdfs-client.artifact>hadoop-hdfs-client</hadoop-hdfs-client.artifact>

Review Comment:
   this properties are same with profile hadoop-3.3 can be ignore



##########
linkis-hadoop-hdfs-client-shade/dependency-reduced-pom.xml:
##########
@@ -0,0 +1,281 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review Comment:
   this file should be ignore



##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   move the spark-2.4-hadoop-3.3 profile to this pom,   and add the dependency to the profile



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1091868630


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   yep, it is anther way, user need to compile with hadoop3 profile at the same time ,shade jar will be needed just in case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092650251


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   > > yep, currently when we run spark ec, hdfs dependency in `linkis-hadoop-hdfs-client-shade` will be effective ,though `linkis-hadoop-common ` in classpath
   > 
   > so, you can add `linkis-hadoop-common` as provided?
   
   em..,,not sure if the compile will pass ,i will try it later, ready to go company



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092646410


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   > yep, currently when we run spark ec, hdfs dependency in `linkis-hadoop-hdfs-client-shade` will be effective ,though `linkis-hadoop-common ` in classpath
   
   so, you can add `linkis-hadoop-common` as provided?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1091942713


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   I have tried ,seems not as expectd.
   for now i have excluded hdfs from spark-hive、spark-core.  and so on , if the ```linkis-hadoop-hdfs-client-shade ``` dependency moved to ```spark-2.4-hadoop-3.3``` profile, when we donot compile with hadoop-3.3 and spark-2.4-hadoop-3.3, the spark module will be lack of hdfs dependency, if we add linkis-hadoop-common to the spark module, the compile will be failed for ```Unrecognized Hadoop major version numb```. if we do not exclude hdfs from the spark-core, spark-hive and so on. when we compile with hadoop-3.3 and spark-2.4-hadoop-3.3, the spark module  output lib will have hadoop3 dependency,backing to our previous version problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1093242093


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -418,6 +454,18 @@
         </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-common</artifactId>
+      <version>${hadoop-hdfs-client-shade.version}</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>${hadoop-hdfs-client-shade.artifact}</artifactId>
+      <version>${hadoop-hdfs-client-shade.version}</version>

Review Comment:
   can there remove the version property? or use ${hadoop.version}



##########
linkis-engineconn-plugins/shell/pom.xml:
##########
@@ -111,6 +111,10 @@
       <scope>provided</scope>
     </dependency>
 
+    <dependency>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
+    </dependency>

Review Comment:
   this dependency can be removed



##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -418,6 +454,18 @@
         </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-common</artifactId>
+      <version>${hadoop-hdfs-client-shade.version}</version>

Review Comment:
   can there remove the version property? ,   or use ${hadoop.version}



##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -140,6 +140,11 @@
       </exclusions>
     </dependency>
 
+    <dependency>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
+    </dependency>

Review Comment:
   this dependency can be removed



##########
linkis-engineconn-plugins/python/pom.xml:
##########
@@ -67,6 +67,11 @@
       <scope>provided</scope>
     </dependency>
 
+    <dependency>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
+    </dependency>

Review Comment:
   this dependency can be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
jackxu2011 commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1383092731

   > I am going to shade the hadoop-client separately, so it intends to solve two hadoop-client version in spark or hive engine classpath and to solve lower hive version(spark-hive force depencency) conflict with high hadoop version. do you got better ideas for that ?
   
   i think, when use hadoop 3.3 should update the spark, and hive version,  witch direct support hadoop 3.3.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
jackxu2011 commented on PR #4110:
URL: https://github.com/apache/linkis/pull/4110#issuecomment-1382636076

   I found, the spark2.4.3 just have hadoop 3.1 profile. is it support hadoop 3.3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "peacewong (via GitHub)" <gi...@apache.org>.
peacewong commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1091579447


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -112,7 +112,7 @@
 
         <exclusion>
           <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-hdfs</artifactId>
+          <artifactId>${hadoop-hdfs-client.artifact}</artifactId>

Review Comment:
   Can it be configured as *



##########
pom.xml:
##########
@@ -111,6 +112,9 @@
     <revision>1.3.2-SNAPSHOT</revision>
     <jedis.version>2.9.2</jedis.version>
     <hadoop.version>2.7.2</hadoop.version>
+    <hadoop-hdfs-client-shade.artifact>hadoop-hdfs</hadoop-hdfs-client-shade.artifact>

Review Comment:
   What is the difference between hadoop-hdfs-client-shade.artifact and hadoop-hdfs-client.artifact?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong merged pull request #4110: [Feat]support different hadoop version compile

Posted by "peacewong (via GitHub)" <gi...@apache.org>.
peacewong merged PR #4110:
URL: https://github.com/apache/linkis/pull/4110


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
jackxu2011 commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1070278476


##########
tool/dependencies/known-dependencies.txt:
##########
@@ -70,6 +70,7 @@ commons-io-2.11.0.jar
 commons-jxpath-1.3.jar
 commons-lang-2.6.jar
 commons-lang3-3.12.0.jar
+commons-logging-1.1.3.jar
 commons-logging-1.2.jar

Review Comment:
   I mean add the commons-logging-1.2.jar to the dependencyManagement in the parent pom.xml, and the commons-logging-1.1.3.jar will disappeare.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [WIP]support different hadoop version compile

Posted by GitBox <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1069229806


##########
tool/dependencies/known-dependencies.txt:
##########
@@ -70,6 +70,7 @@ commons-io-2.11.0.jar
 commons-jxpath-1.3.jar
 commons-lang-2.6.jar
 commons-lang3-3.12.0.jar
+commons-logging-1.1.3.jar
 commons-logging-1.2.jar

Review Comment:
   > 
   
   seems the cli will throw the unknow dependencies, so i added commons-logging-1.1.3.jar in tool/dependencies/known-dependencies.txt file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092644991


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   yep, currently when we run spark ec, hdfs dependency in ```linkis-hadoop-hdfs-client-shade``` will be effective ,though ```linkis-hadoop-common ``` in classpath



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1094168730


##########
pom.xml:
##########
@@ -111,6 +112,9 @@
     <revision>1.3.2-SNAPSHOT</revision>
     <jedis.version>2.9.2</jedis.version>
     <hadoop.version>2.7.2</hadoop.version>
+    <hadoop-hdfs-client-shade.artifact>hadoop-hdfs</hadoop-hdfs-client-shade.artifact>

Review Comment:
   ```hadoop-hdfs-client.artifact``` will be used as our hdfs client artifact for different hadoop version,enum as  ```hadoop-hdfs``` for hadoop2 and ``hadoop-hdfs-client`` for hadoop3, the version controlled by ``hadoop.version``
   
   ``hadoop-hdfs-client-shade.artifact`` will be used as hadoop shade artifact ,now for spark use, different spark version may have different  ``hadoop-hdfs-client-shade.version``,the version controlled by ``hadoop-hdfs-client-shade.version``, instand of ``hadoop.version``, and ``hadoop-hdfs-client-shade.artifact``  also enum as  ```hadoop-hdfs``` for hadoop2 and ``hadoop-hdfs-client`` for hadoop3 ,prepared for furture spark3 possible used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1094176323


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -112,7 +112,7 @@
 
         <exclusion>
           <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-hdfs</artifactId>
+          <artifactId>${hadoop-hdfs-client.artifact}</artifactId>

Review Comment:
   i think it is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "peacewong (via GitHub)" <gi...@apache.org>.
peacewong commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1091575584


##########
linkis-engineconn-plugins/python/pom.xml:
##########
@@ -67,6 +67,11 @@
       <scope>provided</scope>
     </dependency>
 
+    <dependency>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
+    </dependency>

Review Comment:
   is this necessary?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4110: [Feat]support different hadoop version compile

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4110:
URL: https://github.com/apache/linkis/pull/4110#discussion_r1092033004


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -202,13 +207,103 @@
       <artifactId>linkis-rpc</artifactId>
       <version>${project.version}</version>
     </dependency>
-
+    <dependency>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   i have also tried to use relocation, but found spark have some force dependency for some hadoop-common class(shade class type will be rejected),  @jackxu2011 do you have better ideas ?
   ```
     def hadoopFile[K, V](path : scala.Predef.String, inputFormatClass : scala.Predef.Class[_ <: org.apache.hadoop.mapred.InputFormat[K, V]], keyClass : scala.Predef.Class[K], valueClass : scala.Predef.Class[V], minPartitions : scala.Int = { /* compiled code */ }) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
   
   ```
   ```
     def transfer(sc: SparkContext, path: String, encoding: String): RDD[String] = {
       sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], 1)
         .map(p => new String(p._2.getBytes, 0, p._2.getLength, encoding))
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org