You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/05/12 12:07:05 UTC

[GitHub] [hive] kgyrtkirk commented on a diff in pull request #3279: HIVE-24484: Upgrade Hadoop to 3.3.2.

kgyrtkirk commented on code in PR #3279:
URL: https://github.com/apache/hive/pull/3279#discussion_r871272145


##########
common/pom.xml:
##########
@@ -195,6 +194,11 @@
       <artifactId>tez-api</artifactId>
       <version>${tez.version}</version>
     </dependency>
+    <dependency>
+      <groupId>org.fusesource.jansi</groupId>
+      <artifactId>jansi</artifactId>
+      <version>2.3.4</version>

Review Comment:
   move version to root pom



##########
itests/pom.xml:
##########
@@ -352,6 +352,12 @@
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-yarn-client</artifactId>
         <version>${hadoop.version}</version>
+        <exclusions>
+          <exclusion>
+            <groupId>org.jline</groupId>
+            <artifactId>jline</artifactId>
+          </exclusion>

Review Comment:
   I'm not sure if this fix will work; it could work for the tests; but you've just excluded the dependency; I think that will not prevent that dep from appearing on the classpath during runtime...
   
   have you tested a dist build as well?



##########
ql/src/java/org/apache/hadoop/hive/ql/io/RecordReaderWrapper.java:
##########
@@ -69,7 +70,14 @@ static RecordReader create(InputFormat inputFormat, HiveInputFormat.HiveInputSpl
       JobConf jobConf, Reporter reporter) throws IOException {
     int headerCount = Utilities.getHeaderCount(tableDesc);
     int footerCount = Utilities.getFooterCount(tableDesc, jobConf);
-    RecordReader innerReader = inputFormat.getRecordReader(split.getInputSplit(), jobConf, reporter);
+
+    RecordReader innerReader = null;
+    try {
+     innerReader = inputFormat.getRecordReader(split.getInputSplit(), jobConf, reporter);
+    } catch (InterruptedIOException iioe) {
+      // If reading from the underlying record reader is interrupted, return a no-op record reader

Review Comment:
   why not simply propagate the `Exception` ?
   This will hide away the exception



##########
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:
##########
@@ -178,8 +178,7 @@ public void authorize(Database db, Privilege[] readRequiredPriv, Privilege[] wri
 
   private static boolean userHasProxyPrivilege(String user, Configuration conf) {
     try {
-      if (MetaStoreServerUtils.checkUserHasHostProxyPrivileges(user, conf,
-              HMSHandler.getIPAddress())) {
+      if (MetaStoreServerUtils.checkUserHasHostProxyPrivileges(user, conf, HMSHandler.getIPAddress())) {

Review Comment:
   I think max_linelength should be <=100 ; are you using the `dev-support/eclipse-styles.xml` ?



##########
streaming/src/test/org/apache/hive/streaming/TestStreaming.java:
##########
@@ -1317,6 +1318,11 @@ public void testTransactionBatchEmptyCommit() throws Exception {
     connection.close();
   }
 
+  /**
+   * Starting with HDFS 3.3.1, the underlying system NOW SUPPORTS hflush so this
+   * test fails.

Review Comment:
   ok; then I think this test could be probably converted into a test which checks that it works



##########
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatMultiOutputFormat.java:
##########
@@ -315,18 +320,19 @@ public void testOutputFormat() throws Throwable {
 
     // Check permisssion on partition dirs and files created
     for (int i = 0; i < tableNames.length; i++) {
-      Path partitionFile = new Path(warehousedir + "/" + tableNames[i]
-        + "/ds=1/cluster=ag/part-m-00000");
-      FileSystem fs = partitionFile.getFileSystem(mrConf);
-      Assert.assertEquals("File permissions of table " + tableNames[i] + " is not correct",
-        fs.getFileStatus(partitionFile).getPermission(),
-        new FsPermission(tablePerms[i]));
-      Assert.assertEquals("File permissions of table " + tableNames[i] + " is not correct",
-        fs.getFileStatus(partitionFile.getParent()).getPermission(),
-        new FsPermission(tablePerms[i]));
-      Assert.assertEquals("File permissions of table " + tableNames[i] + " is not correct",
-        fs.getFileStatus(partitionFile.getParent().getParent()).getPermission(),
-        new FsPermission(tablePerms[i]));
+      final Path partitionFile = new Path(warehousedir + "/" + tableNames[i] + "/ds=1/cluster=ag/part-m-00000");
+      final Path grandParentOfPartitionFile = partitionFile.getParent();

Review Comment:
   I would expect `grandParent` to be parent-of-parent;
   
   I think this change could be revoked  - it was more readable earlier; the last assert now checks for the `parent` dir and not for `parent.parent`; the second assert was also clobbered....



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOnHDFSEncryptedZones.java:
##########
@@ -123,57 +122,24 @@ public void targetAndSourceHaveDifferentEncryptionZoneKeys() throws Throwable {
               put(HiveConf.ConfVars.REPLDIR.varname, primary.repldDir);
             }}, "test_key123");
 
-    List<String> dumpWithClause = Arrays.asList(
-            "'hive.repl.add.raw.reserved.namespace'='true'",
-            "'" + HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname + "'='"
-                    + replica.externalTableWarehouseRoot + "'",
-            "'distcp.options.skipcrccheck'=''",
-            "'" + HiveConf.ConfVars.HIVE_SERVER2_ENABLE_DOAS.varname + "'='false'",
-            "'" + HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER.varname + "'='"
-                    + UserGroupInformation.getCurrentUser().getUserName() +"'");
-    WarehouseInstance.Tuple tuple =
-            primary.run("use " + primaryDbName)
-                    .run("create table encrypted_table (id int, value string)")
-                    .run("insert into table encrypted_table values (1,'value1')")
-                    .run("insert into table encrypted_table values (2,'value2')")
-                    .dump(primaryDbName, dumpWithClause);
-
-    replica
-            .run("repl load " + primaryDbName + " into " + replicatedDbName
-                    + " with('hive.repl.add.raw.reserved.namespace'='true', "
-                    + "'hive.repl.replica.external.table.base.dir'='" + replica.externalTableWarehouseRoot + "', "
-                    + "'hive.exec.copyfile.maxsize'='0', 'distcp.options.skipcrccheck'='')")
-            .run("use " + replicatedDbName)
-            .run("repl status " + replicatedDbName)
-            .verifyResult(tuple.lastReplicationId);
-
-    try {
-      replica
-              .run("select value from encrypted_table")
-              .verifyResults(new String[] { "value1", "value2" });
-      Assert.fail("Src EZKey shouldn't be present on target");
-    } catch (IOException e) {
-      Assert.assertTrue(e.getCause().getMessage().contains("KeyVersion name 'test_key@0' does not exist"));
-    }
-
     //read should pass without raw-byte distcp
-    dumpWithClause = Arrays.asList( "'" + HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname + "'='"
+    List<String> dumpWithClause = Arrays.asList( "'" + HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname + "'='"
             + replica.externalTableWarehouseRoot + "'");
-    tuple = primary.run("use " + primaryDbName)
+    WarehouseInstance.Tuple tuple =
+        primary.run("use " + primaryDbName)
             .run("create external table encrypted_table2 (id int, value string)")
             .run("insert into table encrypted_table2 values (1,'value1')")
             .run("insert into table encrypted_table2 values (2,'value2')")
             .dump(primaryDbName, dumpWithClause);
 
     replica
-            .run("repl load " + primaryDbName + " into " + replicatedDbName
-                    + " with('hive.repl.replica.external.table.base.dir'='" + replica.externalTableWarehouseRoot + "', "
-                    + "'hive.exec.copyfile.maxsize'='0', 'distcp.options.skipcrccheck'='')")
-            .run("use " + replicatedDbName)
-            .run("repl status " + replicatedDbName)
-            .verifyResult(tuple.lastReplicationId)

Review Comment:
   wasn't this the expected behaviour?



##########
storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java:
##########
@@ -18,10 +18,10 @@
 
 package org.apache.hadoop.hive.common;
 
-import org.apache.commons.lang.StringUtils;
+import org.apache.commons.lang3.StringUtils;

Review Comment:
   these lang/lang3 changes seem unrelated to me; I think they could be done in a separate jira to reduce the amount of work.
   
   if you are moving away from the usage of `org.apache.commons.lang`  ; could you please also ban it in thr root pom.xml?



##########
standalone-metastore/pom.xml:
##########
@@ -227,6 +227,10 @@
         <artifactId>hadoop-mapreduce-client-core</artifactId>
         <version>${hadoop.version}</version>
         <exclusions>
+          <exclusion>
+            <groupId>org.jline</groupId>
+            <artifactId>jline</artifactId>
+          </exclusion>

Review Comment:
   this jline dep just creeps in from multiple hadoop artifacts; the best would be to upgrade jline and not risk our chances with exclusions



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##########
@@ -9361,7 +9362,8 @@ public NotificationEventsCountResponse get_notification_events_count(Notificatio
   private void authorizeProxyPrivilege() throws TException {
     // Skip the auth in embedded mode or if the auth is disabled
     if (!HiveMetaStore.isMetaStoreRemote() ||
-        !MetastoreConf.getBoolVar(conf, ConfVars.EVENT_DB_NOTIFICATION_API_AUTH)) {
+        !MetastoreConf.getBoolVar(conf, ConfVars.EVENT_DB_NOTIFICATION_API_AUTH) || conf.getBoolean(HIVE_IN_TEST.getVarname(),
+        false)) {

Review Comment:
   you are turning a feature off based on this `HIVE_IN_TEST` boolean; which means the feature will not be tested during regular hive test; please find another way; and since it seems like this is being turned off multiple places - can you cover it with a test?



##########
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOnHDFSEncryptedZones.java:
##########
@@ -123,57 +122,24 @@ public void targetAndSourceHaveDifferentEncryptionZoneKeys() throws Throwable {
               put(HiveConf.ConfVars.REPLDIR.varname, primary.repldDir);
             }}, "test_key123");
 
-    List<String> dumpWithClause = Arrays.asList(

Review Comment:
   seems like a testcase was removed; I wonder if this it not supported anymore ? ...and why are we removing this case in the scope of a hadoop upgrade?



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:
##########
@@ -18,20 +18,8 @@
 
 package org.apache.hadoop.hive.ql.exec;
 
-import java.io.FileNotFoundException;

Review Comment:
   import order is different in your IDE than in existing code; can you configure it to not reorder the imports in every file?
   
   I wonder if we have some agreement what order we want to use....



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org