You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@linkis.apache.org by "GuoPhilipse (via GitHub)" <gi...@apache.org> on 2023/02/24 09:04:50 UTC

[GitHub] [linkis] GuoPhilipse opened a new pull request, #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

GuoPhilipse opened a new pull request, #4263:
URL: https://github.com/apache/linkis/pull/4263

   ### What is the purpose of the change
   
   we now support different hadoop.hive,spark vesion, so we can upgrade hadoop\spark\hive default vertion to 3.x to solve possible security issue
   
   ### Related issues/PRs
   
   Related issues: #4262
   Related pr:#4263
   
   
   ### Brief change log
   
   - upgrade hadoop from 2.7.2 ->3.3.4
   - upgrade hive from 2.3.3 ->3.1.3
   - upgrade spark from 2.4.3 ->3.2.1
   - remove profile spark-3.2 for spark will be 3.x by default
   - remove profile hadoop-3.3 for hadoop will be 3.x by default
   - modify profile spark-2.4-hadoop-3.3 to spark-2.4 for hadoop vesion will be 3.x by default
   - upgrade default  curator\json4s\scala\hadoop-hdfs-client.artifact properties  to fit for hadoop3/spark3
   - update doc introduction
   - update known depdency
   
   
   ### Checklist
   
   - [x] I have read the [Contributing Guidelines on pull requests](https://github.com/facebook/docusaurus/blob/main/CONTRIBUTING.md#pull-requests).
   - [ ] I have explained the need for this PR and the problem it solves
   - [ ] I have explained the changes or the new features added to this PR
   - [ ] I have added tests corresponding to this change
   - [ ] I have updated the documentation to reflect this change
   - [ ] I have verified that this change is backward compatible (If not, please discuss on the [Linkis mailing list](https://linkis.apache.org/community/how-to-subscribe) first)
   - [ ] **If this is a code change**: I have written unit tests to fully verify the new behavior.
   
   
   
   <!--
   
   Note
   
   1. Mark the PR title as `[WIP] title` until it's ready to be reviewed.
      如果PR还未准备好被review,请在标题上添加[WIP]标识(WIP work in progress)
   
   2. Always add/update tests for any changes unless you have a good reason.
      除非您有充分的理由,否则任何修改都需要添加/更新测试
      
   3. Always update the documentation to reflect the changes made in the PR.
      始终更新文档以反映 PR 中所做的更改  
      
   4. After the PR is submitted, please pay attention to the execution result of git action check. 
      If there is any failure, please adjust it in time
      PR提交后,请关注git action check 执行结果,关键的check失败时,请及时修正
      
   5. Before the pr is merged, if the commit is missing, you can continue to commit the code
       在未合并前,如果提交有遗漏,您可以继续提交代码 
   
   6. After you submit PR, you can add assistant WeChat, the WeChat QR code is 
      https://user-images.githubusercontent.com/7869972/176336986-d6b9be8f-d1d3-45f1-aa45-8e6adf5dd244.png 
      您提交pr后,可以添加助手微信,微信二维码为
      https://user-images.githubusercontent.com/7869972/176336986-d6b9be8f-d1d3-45f1-aa45-8e6adf5dd244.png
   
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4263:
URL: https://github.com/apache/linkis/pull/4263#discussion_r1117929352


##########
pom.xml:
##########
@@ -1352,30 +1352,13 @@
   </build>
 
   <profiles>
-    <!-- hadoop version: mvn validate -Phadoop-3.3 ,when used with spark2.x ,please add -Pspark-2.4-hadoop-3.3 together, More details please check SPARK-23534   -->
-    <profile>
-      <id>hadoop-3.3</id>
-      <properties>
-        <hadoop.version>3.3.1</hadoop.version>
-        <curator.version>4.2.0</curator.version>
-        <hadoop-hdfs-client.artifact>hadoop-hdfs-client</hadoop-hdfs-client.artifact>
-      </properties>
-    </profile>
     <!-- hadoop version: mvn validate -Phadoop-2.7  -->
     <profile>
       <id>hadoop-2.7</id>
       <properties>
         <hadoop.version>2.7.2</hadoop.version>
         <curator.version>2.7.1</curator.version>
-      </properties>
-    </profile>
-    <profile>
-      <id>spark-3.2</id>

Review Comment:
   this should changed to spark-2.7,  and overwrite some properties



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong merged pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "peacewong (via GitHub)" <gi...@apache.org>.
peacewong merged PR #4263:
URL: https://github.com/apache/linkis/pull/4263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] jackxu2011 commented on a diff in pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "jackxu2011 (via GitHub)" <gi...@apache.org>.
jackxu2011 commented on code in PR #4263:
URL: https://github.com/apache/linkis/pull/4263#discussion_r1118086319


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -435,16 +435,92 @@
       </exclusions>
     </dependency>
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
       <scope>provided</scope>
     </dependency>
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
-      <scope>provided</scope>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   this dependency is not needed by spark 3.2, it's just for spark2.4 working with hadoop3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4263:
URL: https://github.com/apache/linkis/pull/4263#discussion_r1118095979


##########
linkis-engineconn-plugins/spark/pom.xml:
##########
@@ -435,16 +435,92 @@
       </exclusions>
     </dependency>
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-common</artifactId>
-      <version>${hadoop.version}</version>
+      <groupId>org.eclipse.jetty</groupId>
+      <artifactId>jetty-client</artifactId>
       <scope>provided</scope>
     </dependency>
     <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-hdfs</artifactId>
-      <version>${hadoop.version}</version>
-      <scope>provided</scope>
+      <groupId>org.apache.linkis</groupId>
+      <artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
+      <version>${project.version}</version>

Review Comment:
   yes, will try variable to make it suitable only for spark2.4, just keep hadoop-common and hadoop-hdfs provided if no t profile spark2.4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] GuoPhilipse commented on a diff in pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "GuoPhilipse (via GitHub)" <gi...@apache.org>.
GuoPhilipse commented on code in PR #4263:
URL: https://github.com/apache/linkis/pull/4263#discussion_r1125500055


##########
linkis-dist/helm/scripts/resources/ldh/configmaps/configmap-spark.yaml:
##########
@@ -141,7 +141,7 @@ data:
     spark.sql.autoBroadcastJoinThreshold 26214400
     spark.sql.hive.convertMetastoreOrc true
     spark.sql.hive.metastore.jars /opt/ldh/current/spark/jars/*
-    spark.sql.hive.metastore.version 2.3.3
+    spark.sql.hive.metastore.version 3.1.3

Review Comment:
   spark3.2.1 use hive meta 2.3.9 by default  ,maybe we can change as 2.3.9 ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org


[GitHub] [linkis] peacewong commented on a diff in pull request #4263: [feat] upgrade hadoop\spark\hive default vertion to 3.x

Posted by "peacewong (via GitHub)" <gi...@apache.org>.
peacewong commented on code in PR #4263:
URL: https://github.com/apache/linkis/pull/4263#discussion_r1125443402


##########
linkis-computation-governance/linkis-client/linkis-cli/linkis-cli-application/src/main/java/org/apache/linkis/cli/application/operator/ujes/UJESConstants.java:
##########
@@ -41,6 +41,6 @@ public class UJESConstants {
 
   public static final int DEFAULT_PAGE_SIZE = 500;
 
-  public static final String DEFAULT_SPARK_ENGINE = "spark-2.4.3";
-  public static final String DEFAULT_HIVE_ENGINE = "hive-1.2.1";
+  public static final String DEFAULT_SPARK_ENGINE = "spark-3.2.1";

Review Comment:
   Can remove 



##########
linkis-computation-governance/linkis-computation-governance-common/src/main/scala/org/apache/linkis/governance/common/conf/GovernaceCommonConf.scala:
##########
@@ -23,9 +23,9 @@ object GovernanceCommonConf {
 
   val CONF_FILTER_RM = "wds.linkis.rm"
 
-  val SPARK_ENGINE_VERSION = CommonVars("wds.linkis.spark.engine.version", "2.4.3")
+  val SPARK_ENGINE_VERSION = CommonVars("wds.linkis.spark.engine.version", "3.2.1")

Review Comment:
   Should use LabelCommonConfig:HIVE_ENGINE_VERSION and SPARK_ENGINE_VERSION



##########
linkis-computation-governance/linkis-client/linkis-computation-client/src/test/java/org/apache/linkis/computation/client/InteractiveJobTest.java:
##########
@@ -29,7 +29,7 @@ public static void main(String[] args) {
     SubmittableInteractiveJob job =
         LinkisJobClient.interactive()
             .builder()
-            .setEngineType("hive-2.3.3")
+            .setEngineType("hive-3.1.3")

Review Comment:
   Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#HIVE_ENGINE_VERSION



##########
linkis-dist/helm/charts/linkis/templates/configmap-init-sql.yaml:
##########
@@ -1183,12 +1183,12 @@ data:
     (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
     INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = '*-*,*-*');
 
-    -- spark2.4.3 default configuration
+    -- spark3.2.1 default configuration
     insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_value`, `config_label_id`)
     (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
     INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @SPARK_ALL);
 
-    -- hive1.2.1 default configuration
+    -- hive3.1.3 default configuration

Review Comment:
   can remove 3.1.3



##########
linkis-dist/package/db/linkis_dml.sql:
##########
@@ -189,18 +189,18 @@ insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_featur
 insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @PRESTO_ALL, 'OPTIONAL', 2, now(), now());
 insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @TRINO_ALL, 'OPTIONAL', 2, now(), now());
 
--- Custom correlation engine (e.g. spark-2.4.3) and configKey value
+-- Custom correlation engine (e.g. spark-3.2.1) and configKey value
 -- Global Settings
 insert into `linkis_ps_configuration_key_engine_relation` (`config_key_id`, `engine_type_label_id`)
 (select config.id as `config_key_id`, label.id AS `engine_type_label_id` FROM linkis_ps_configuration_config_key config
 INNER JOIN linkis_cg_manager_label label ON config.engine_conn_type is null and label.label_value = "*-*,*-*");
 
--- spark-2.4.3(Here choose to associate all spark type Key values with spark2.4.3)
+-- spark-3.2.1(Here choose to associate all spark type Key values with spark3.2.1)

Review Comment:
   can remove 3.2.1 and 3.1.3



##########
linkis-computation-governance/linkis-entrance/src/main/scala/org/apache/linkis/entrance/parser/CommonEntranceParser.scala:
##########
@@ -134,7 +134,7 @@ class CommonEntranceParser(val persistenceManager: PersistenceManager)
   private def checkEngineTypeLabel(labels: util.Map[String, Label[_]]): Unit = {
     val engineTypeLabel = labels.getOrDefault(LabelKeyConstant.ENGINE_TYPE_KEY, null)
     if (null == engineTypeLabel) {
-      val msg = s"You need to specify engineTypeLabel in labels, such as spark-2.4.3"
+      val msg = s"You need to specify engineTypeLabel in labels, such as spark-3.2.1"

Review Comment:
   Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#SPARK_ENGINE_VERSION



##########
linkis-computation-governance/linkis-manager/linkis-label-common/src/test/java/org/apache/linkis/manager/label/TestLabelBuilder.java:
##########
@@ -27,7 +27,7 @@ public class TestLabelBuilder {
 
   public static void main(String[] args) throws LabelErrorException {
     LabelBuilderFactory labelBuilderFactory = LabelBuilderFactoryContext.getLabelBuilderFactory();
-    Label<?> engineType = labelBuilderFactory.createLabel("engineType", "hive-1.2.1");
+    Label<?> engineType = labelBuilderFactory.createLabel("engineType", "hive-3.1.3");

Review Comment:
   Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#HIVE_ENGINE_VERSION



##########
linkis-computation-governance/linkis-manager/linkis-manager-common/src/main/scala/org/apache/linkis/manager/common/conf/ManagerCommonConf.scala:
##########
@@ -23,7 +23,7 @@ object ManagerCommonConf {
 
   val DEFAULT_ENGINE_TYPE = CommonVars("wds.linkis.default.engine.type", "spark")
 
-  val DEFAULT_ENGINE_VERSION = CommonVars("wds.linkis.default.engine.version", "2.4.3")
+  val DEFAULT_ENGINE_VERSION = CommonVars("wds.linkis.default.engine.version", "3.2.1")

Review Comment:
   Can remove



##########
linkis-dist/helm/charts/linkis/templates/configmap-init-sql.yaml:
##########
@@ -1183,12 +1183,12 @@ data:
     (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
     INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = '*-*,*-*');
 
-    -- spark2.4.3 default configuration
+    -- spark3.2.1 default configuration

Review Comment:
   can remove 3.2.1



##########
linkis-dist/helm/scripts/resources/ldh/configmaps/configmap-spark.yaml:
##########
@@ -141,7 +141,7 @@ data:
     spark.sql.autoBroadcastJoinThreshold 26214400
     spark.sql.hive.convertMetastoreOrc true
     spark.sql.hive.metastore.jars /opt/ldh/current/spark/jars/*
-    spark.sql.hive.metastore.version 2.3.3
+    spark.sql.hive.metastore.version 3.1.3

Review Comment:
   should use 2.3.3?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@linkis.apache.org
For additional commands, e-mail: notifications-help@linkis.apache.org