You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/18 15:37:50 UTC

[GitHub] [iceberg] marton-bod opened a new pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

marton-bod opened a new pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478


   This is a second iteration after an earlier, draft PR: https://github.com/apache/iceberg/pull/1455
   
   The goal of this PR is to enable building the Iceberg repository using Hive3 and Hadoop3.
   The build can be launched with a property flag:
   `gradle build -Dhive3`
   This will pull in Hive3 and Hadoop3 as dependencies and build all modules, except for Spark and Flink:
   - iceberg-spark
   - iceberg-spark2
   - iceberg-spark-runtime
   - iceberg-spark3
   - iceberg-spark3-runtime
   - iceberg-flink
   - iceberg-flink-runtime
   
   The advantage of this approach is that we can run the build for the entire repository with the hive3 flag as above, but it introduces some separation in the gradle build file as to which projects should be included (based on the hive3Enabled flag). The alternative approach would be to only allow hive3 builds for certain submodules, e.g. 
   `gradle :iceberg-mr:build :iceberg-hive-metastore:build :iceberg-hive-runtime:build <...others> -Dhive3`
   but this would mean that the above command (`gradle build -Dhive3`) would fail, and anytime we enable a new module to work with Hive3, the CI commands would have to be updated to test for that as well. I'm interested in your opinion as to which approach you prefer.
   
   If the hive3 build flag is not used, everything should stay the same as before, with no changes to the build output (except that there is a new module called iceberg-mr-hive3).
   
   (Note 1: Hive2- and Hive3-specific parts have been moved from iceberg-mr into iceberg-mr-hive2/iceberg-mr-hive3)
   
   (Note 2: currently the hive3 build works only for Java 8, there are some exceptions on Java11 regarding HiveRunner and metastore initialization. Therefore hive3 builds are disabled on Java11 for now. This should be addressed in a separate patch.)
   
   (Note 3: for some reason the build.gradle diff shows a lot of output, but most of the changes in that file are only indentation, since the Spark and Flink subprojects have been wrapped into if blocks to skip building them when Hive3 dependencies are enabled. IntelliJ seems to handle this more intelligently.)
   
   Once this has been merged, we should think about integrating the Java8 hive3 build into our CI pipeline. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695809197


   Thanks @marton-bod for your comment. Is it possible, then, to only have hive3 dependencies under `iceberg-mr-hive3` instead of building other modules too with hive3 and hadoop3?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698263424


   Thanks @rdblue for your comment. I will look into refactoring the solution to use your suggested approach, which I like. 
   
   My only concern is that because there is a breaking change in the metastore API between Hive2 and 3, there will have to be two `iceberg-hive-metastore` module versions, one for Hive3 (`iceberg-mr-hive3` would use this for the HiveCatalog tests) and one for Hive2 (the rest of the modules would use this). Not sure, but hopefully it's possible to create a second metastore module in Gradle pointing to the same source files, but using different dependency versions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695490407


   Hi @marton-bod thank you for working on this. I have a few questions
   
   1. I think `hive-metastore` module should be unaffected whether we are running with hive2 or hive3 as `hive-metastore` only uses Metatstoreclient API and I've seen that Metastore upgrades are always backward compatible. I believe Hive metastore 3 should be backward compatible with Hive metastore client 2.  In that respect we could keep using `hive-metastore` module as is. @pvary, @rdblue   thoughts?
   2. For the hive code in `mr` module which requires changes because of hive2/hive3 artifacts.  Is it possible to move the hive specific code from `mr` module and have 3 modules [similar to spark2/spark3]  - `hive-exec` for common code, `hive-exec2` and `hive-exec3` for incompatible code. This way our `mr` modules is simplified and it is also aligned with our discussion on the mailing list to have an `hive-exec` module.  This is easier said than done since it will mean we might have to ship different runtime jars. So best to also get agreement from folks like @rdblue @massdosage and @pvary .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496949231



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -72,7 +85,7 @@ public ObjectInspector primitive(Type.PrimitiveType primitiveType) {
         primitiveTypeInfo = TypeInfoFactory.booleanTypeInfo;
         break;
       case DATE:
-        return IcebergDateObjectInspector.get();
+        return DATE_INSPECTOR.invoke();

Review comment:
       Minor: we could invoke this just once and store it in a `private static final` field. Same with timestamps. We could store both in fields and choose one using `adjustToUtc`. Not a big deal to always call the reflection method here, though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496843942



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private MetastoreUtil() {
+

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496311893



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
##########
@@ -80,6 +81,15 @@ public void stop() {
     if (hiveLocalDir != null) {
       hiveLocalDir.delete();
     }
+
+    // remove raw store if exists
+    try {
+      Method cleanupRawStore = HiveMetaStore.class.getDeclaredMethod("cleanupRawStore");
+      cleanupRawStore.setAccessible(true);
+      cleanupRawStore.invoke(null);
+    } catch (Exception e) {
+      // no op
+    }

Review comment:
       Could you also rewrite this one to use the `DynMethods` helpers?
   
   ```java
     private static final DynMethods.StaticMethod CLEAN_RAW_STORE = DynMethods.builder("cleanupRawStore")
         .hiddenImpl(HiveMetaStore.class)
         .orNoop()
         .buildStatic();
   
       CLEAN_RAW_STORE.invoke();
   ```
   
   That avoids needing a try/catch block since you can replace the method with a no-op if it isn't found. And `hiddenImpl` will make it accessible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698263424


   Thanks @rdblue for your comment. I will look into refactoring the solution to use your suggested approach, which I like. 
   
   My only concern is that because there is a breaking change in the metastore API between Hive2 and 3, there will have to be two `iceberg-hive-metastore` module versions, one for Hive3 (`iceberg-mr-hive3` would use this for the HiveCatalog tests) and one for Hive2 (the rest of the modules would use this). Not sure, but hopefully it's possible to create a second metastore module in Gradle pointing to the same source files, but using different dependency versions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-701695790


   Looks like there are some tests failures:


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496770771



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       So HIve 2 works with newer versions of Java than Hive 3 does? :facepalm: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497494476



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -72,7 +85,7 @@ public ObjectInspector primitive(Type.PrimitiveType primitiveType) {
         primitiveTypeInfo = TypeInfoFactory.booleanTypeInfo;
         break;
       case DATE:
-        return IcebergDateObjectInspector.get();
+        return DATE_INSPECTOR.invoke();

Review comment:
       saved them into `private static final` fields




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-701466352


   Thanks @rdblue once again for the improvement suggestions - I've addressed them.
   As for JDK11, we do have test failures unfortunately (https://github.com/apache/iceberg/pull/1478#discussion_r497471538) therefore we'll need to go with JDK8 only for Hive3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496943192



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Do tests actually fail with JDK 11? Or is this because they might fail?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-700362249


   Nice work, @marton-bod! This looks a lot better and I think we will be able to merge it soon. We should clean it up further by using the reflection helpers in iceberg-common first, though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496311174



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private MetastoreUtil() {
+
+  }
+
+  /**
+   * @return true if Hive3 dependencies are found on classpath, false otherwise
+   */
+  public static boolean hive3PresentOnClasspath() {

Review comment:
       Can this use a private static variable with the value? That way we don't need to attempt to load the class every time this is called.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-700901054


   Thank you, @marton-bod! Looking good but there are a few more issues. Mostly, I'd prefer not to parse values in tests.
   
   And, if tests pass in Java 11, I think we should go ahead an keep the module in Java 11 as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496310879



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java
##########
@@ -41,7 +41,16 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected HiveMetaStoreClient newClient()  {
     try {
-      return new HiveMetaStoreClient(hiveConf);
+      // create the metastore client based on whether we're working with Hive2 or Hive3 dependencies
+      // we need to do this because there is a breaking API change between Hive2 and Hive3
+      if (MetastoreUtil.hive3PresentOnClasspath()) {
+        return (HiveMetaStoreClient) Class
+                .forName(HiveMetaStoreClient.class.getName())
+                .getConstructor(Configuration.class)
+                .newInstance(hiveConf);

Review comment:
       Can you use `DynCtors` instead? I think this can be simpler:
   
   ```
     private static final DynConstructors.Ctor<HiveMetaStoreClient> CLIENT_CTOR = DynConstructors.builder()
         .impl(HiveMetaStoreClient.class, HiveConf.class)
         .impl(HiveMetaStoreClient.class, Configuration.class)
         .build();
   
     protected HiveMetaStoreClient newClient()  {
       try {
         return CLIENT_CTOR.newInstance(hiveConf);
       } catch (...) {
         ...
       }
     }
   ```
   
   This also exposes a bug with the reflection path: `MetaException` is no longer thrown in the `try` block. Since that's a checked exception, it will be wrapped in a `RuntimeException`. You'll need to replace that block with `catch (RuntimeException e) { ... }` and check the cause of the exception for a `MetaException`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496308409



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Why only in Java 8?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497471538



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Yes, all the `TestHiveIcebergStorageHandlerWith*` tests in `iceberg-hive3` fail on JDK11 with:
   ```
   java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader
   ```
   This issue is fixed in https://issues.apache.org/jira/browse/HIVE-21584, but it's not part of the Hive 3.1.2 release




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497471538



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Yes, all the `TestHiveIcebergStorageHandlerWith*` tests in `iceberg-hive3` fail on JDK11 with:
   ```
   java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader
   ```
   This issue is fixed in https://issues.apache.org/jira/browse/HIVE-21584, but it's not part of the official Hive 3.1.2 release, it's only in master as of now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r493815788



##########
File path: settings.gradle
##########
@@ -17,44 +17,64 @@
  * under the License.
  */
 
+gradle.ext.hive3Enabled = false
+if (System.getProperty("hive3") != null) {
+  if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+    println "*** 'hive3' flag detected - building with Hive3 / Hadoop3 ***"
+    gradle.ext.hive3Enabled = true
+  } else {
+    println "*** 'hive3' flag detected, but with JDK version other than 1.8 - skipping Hive3 / Hadoop3 build. ***"
+  }
+}
+
 rootProject.name = 'iceberg'
 include 'api'
 include 'common'
 include 'core'
 include 'data'
-include 'flink'
-include 'flink-runtime'
 include 'mr'
 include 'hive-runtime'
 include 'orc'
 include 'arrow'
 include 'parquet'
 include 'bundled-guava'
-include 'spark'
-include 'spark3'
-include 'spark3-runtime'
 include 'pig'
 include 'hive-metastore'
+if (gradle.ext.hive3Enabled) {
+  include 'mr-hive3'
+} else {
+  include 'mr-hive2'

Review comment:
       The reason I excluded Spark is that some unit tests were failing with Hive3/Hadoop3, since `iceberg-spark2`/`iceberg-spark3` pull in Hive2 as a transitive dependency via `org.apache.spark:spark-hive_2.11/2.12` and it fails to communicate with the Hive3 metastore in the tests. Flink does pass all unit tests but did not want to include it yet so that they don't inadvertently break the Hadoop3 tests - although I'm definitely open to including it if the community supports having a Hadoop3 version of Flink as well.
    
   I agree that hadoop3 could be a better flag, thanks for pointing it out.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496312345



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergTimestampObjectInspectorHive3.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDateTime;
+import java.time.OffsetDateTime;
+import java.time.ZoneOffset;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.TimestampObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+
+public abstract class IcebergTimestampObjectInspectorHive3 extends AbstractPrimitiveJavaObjectInspector
+                                                      implements TimestampObjectInspector {
+
+  private static final IcebergTimestampObjectInspectorHive3 INSTANCE_WITH_ZONE =
+      new IcebergTimestampObjectInspectorHive3() {
+        @Override
+        LocalDateTime toLocalDateTime(Object o) {
+          return ((OffsetDateTime) o).toLocalDateTime();
+        }
+  };

Review comment:
       This indentation might be off, since it corresponds to the `new IcebergTimestampObjectInspectorHive3() {` line.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496312187



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspectorHive3.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspectorHive3 extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {

Review comment:
       The continuation indent here should be 2 indents or 4 spaces from the start of the previous line.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497488281



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -27,12 +27,25 @@
 import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.common.DynMethods;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public final class IcebergObjectInspector extends TypeUtil.SchemaVisitor<ObjectInspector> {
 
+  // get the correct inspectors depending on whether we're working with Hive2 or Hive3 dependencies
+  // we need to do this because there is a breaking API change in Date/TimestampObjectInspector between Hive2 and Hive3
+  private static final DynMethods.StaticMethod DATE_INSPECTOR = DynMethods.builder("get")
+      .impl("org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3", null)
+      .impl(IcebergDateObjectInspector.class, null)

Review comment:
       yeah, good idea to be defensive




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r493661582



##########
File path: settings.gradle
##########
@@ -17,44 +17,64 @@
  * under the License.
  */
 
+gradle.ext.hive3Enabled = false
+if (System.getProperty("hive3") != null) {
+  if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+    println "*** 'hive3' flag detected - building with Hive3 / Hadoop3 ***"
+    gradle.ext.hive3Enabled = true
+  } else {
+    println "*** 'hive3' flag detected, but with JDK version other than 1.8 - skipping Hive3 / Hadoop3 build. ***"
+  }
+}
+
 rootProject.name = 'iceberg'
 include 'api'
 include 'common'
 include 'core'
 include 'data'
-include 'flink'
-include 'flink-runtime'
 include 'mr'
 include 'hive-runtime'
 include 'orc'
 include 'arrow'
 include 'parquet'
 include 'bundled-guava'
-include 'spark'
-include 'spark3'
-include 'spark3-runtime'
 include 'pig'
 include 'hive-metastore'
+if (gradle.ext.hive3Enabled) {
+  include 'mr-hive3'
+} else {
+  include 'mr-hive2'

Review comment:
       Why have we excluded these modules from the build? Seems like if the hive3 flag is used, none of these modules/artifacts will be published. Also, in the future, would we want to publish these modules compiled with hadoop3 artifacts if yes then it makes sense to build all modules, I think.
   What are your thoughts on the flag name, should it be hadoop3 or hive3. I feel like this flag is more general and affects Parquet, ORC, MR as you mentioned before.

##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {

Review comment:
       nit: we can possibly make this a var which can be set  when we check `hive3Enabled` to either `:iceberg-mr-hive2` or `:iceberg-mr-hive3` . This would the  remove if/else branches 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r491911188



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde")
+    }
+  }
+} else {
+  project(':iceberg-mr-hive3') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde:3.1.2")

Review comment:
       Is this version not set in `versions-hive3.props` already? Or is there some reason you need to explicitly set it here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698263424


   Thanks @rdblue for your comment. I will look into refactoring the solution to use your suggested approach, which I like. 
   
   My only concern is that because there is a breaking change in the metastore API between Hive2 and 3, there will have to be two `iceberg-hive-metastore` module versions, one for Hive3 (`iceberg-mr-hive3` would use this for the HiveCatalog tests) and one for Hive2 (the rest of the modules would use this). Not sure, but hopefully it's possible to create a second metastore module in Gradle pointing to the same source files, but using different dependency versions.
   
   The other things I'm thinking is that if the hive2 specific parts are not factored out from `iceberg-mr`, when `iceberg-mr-hive3` pulls that in as a dependency, but using Hive3, those couple of ObjectInspector classes would not compile


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698263424


   Thanks @rdblue for your comment. I will look into refactoring the solution to use your suggested approach, which I like. 
   
   My only concern is that because there is a breaking change in the metastore API between Hive2 and 3, there will have to be two `iceberg-hive-metastore` module versions, one for Hive3 (`iceberg-mr-hive3` would use this for the HiveCatalog tests) and one for Hive2 (the rest of the modules would use this). Not sure, but hopefully it's possible to create a second metastore module in Gradle pointing to the same source files, but using different dependency versions.
   
   The other things I'm thinking is that if the hive2 specific parts are not factored out from `iceberg-mr`, when `iceberg-mr-hive3` pulls that in as a dependency, but using Hive3, those couple of ObjectInspector classes would not compile


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   In order to build `iceberg-mr`/`iceberg-mr-hive3`, we need to build the other iceberg subprojects with Hive3/Hadoop3 as well which `mr` depends on, such as `hive-metastore`, `core`, `parquet`, etc. For example, there is an HMSHandler API change between Hive2 and Hive3, therefore if we didn't build the `hive-metastore` module with Hive3, `mr` (with Hive3) would not be able to communicate with it.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and when this flag's enabled, only those modules are included that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although they can also be added by the community whenever suitable)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496843815



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java
##########
@@ -41,7 +41,16 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected HiveMetaStoreClient newClient()  {
     try {
-      return new HiveMetaStoreClient(hiveConf);
+      // create the metastore client based on whether we're working with Hive2 or Hive3 dependencies
+      // we need to do this because there is a breaking API change between Hive2 and Hive3
+      if (MetastoreUtil.hive3PresentOnClasspath()) {
+        return (HiveMetaStoreClient) Class
+                .forName(HiveMetaStoreClient.class.getName())
+                .getConstructor(Configuration.class)
+                .newInstance(hiveConf);

Review comment:
       makes sense, thanks for the suggestion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698497777


   > there is a breaking change in the metastore API between Hive2 and 3
   
   What was the incompatibility? Ideally, we will handle it with reflection to avoid needing a different module.
   
   > if the hive2 specific parts are not factored out from iceberg-mr, when iceberg-mr-hive3 pulls that in as a dependency, but using Hive3, those couple of ObjectInspector classes would not compile
   
   The classes are already compiled. We just have to avoid loading them. So we would use different class names for the inspectors between 2 and 3 and load the correct one using reflection depending on whether the detected Hive version in 2 or 3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
massdosage commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-697280721


   > @rdblue @massdosage @rdsr
   > Do you guys think we can go ahead with this change to enable Hive 3 builds?
   > Let me know if you have any suggestions, I'm open to discussion and changes. Thank you!
   
   Since this is optional and from what I can tell works fine with Hive 2 by default I'm OK with it. As I said above, I don't know enough about Gradle to really say whether there is a better way of achieving this so I'll defer to the others.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497472515



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private static Boolean hive3PresentOnClasspath = null;

Review comment:
       yep, good idea




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   In order to build `iceberg-mr`/`iceberg-mr-hive3`, we need to build the other iceberg subprojects with Hive3/Hadoop3 as well which `mr` depends on, such as `hive-metastore`, `core`, `parquet`, etc. For example, there is a metastore client API change between Hive2 and Hive3, therefore if we didn't build the `hive-metastore` module with Hive3, `mr` (with Hive3) would not be able to communicate with it.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and only a subset of modules will be built when this flag is enabled, only those modules that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although both Spark3 and Flink could be included at any point if the community agreed, since they pass all the tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695490407


   Hi @marton-bod thank you for working on this. I have a few questions
   
   1. I think {{hive-metastore}} module should be unaffected whether we are running with hive2 or hive3 as {{hive-metastore}} only uses Metatstoreclient API and I've seen that Metastore upgrades are always backward compatible. I believe Hive metastore 3 should be backward compatible with Hive metastore client 2.  In that respect we could keep using {{hive-metastore}} module as is. @pvary, @rdblue   thoughts?
   2. For the hive code in {{mr}} module which requires changes because of hive2/hive3 artifacts.  Is it possible to move the hive specific code from {{mr}} module and have 2/3 modules [similar to spark2/spark3]  - {{hive-exec}} for common code, {{hive-exec2}} and {{hive-exec3}} for incompatible code. This way our {{mr}} modules is simplified and it is also aligned to our discussion on the mailing list to have an {{hive-exec}} module.  This is easier said than done since it could mean we might have to ship different runtime jars. So best to get agreement from folks like @rdblue @massdosage and @pvary .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497521397



##########
File path: hive3/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergTimestampObjectInspectorHive3.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDateTime;
+import java.time.OffsetDateTime;
+import java.time.ZoneOffset;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.TimestampObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestIcebergTimestampObjectInspectorHive3 {
+
+  @Test
+  public void testIcebergTimestampObjectInspector() {
+    TimestampObjectInspector oi = IcebergTimestampObjectInspectorHive3.get(false);
+
+    Assert.assertEquals(ObjectInspector.Category.PRIMITIVE, oi.getCategory());
+    Assert.assertEquals(PrimitiveObjectInspector.PrimitiveCategory.TIMESTAMP, oi.getPrimitiveCategory());
+
+    Assert.assertEquals(TypeInfoFactory.timestampTypeInfo, oi.getTypeInfo());
+    Assert.assertEquals(TypeInfoFactory.timestampTypeInfo.getTypeName(), oi.getTypeName());
+
+    Assert.assertEquals(Timestamp.class, oi.getJavaPrimitiveClass());
+    Assert.assertEquals(TimestampWritableV2.class, oi.getPrimitiveWritableClass());
+
+    Assert.assertNull(oi.copyObject(null));
+    Assert.assertNull(oi.getPrimitiveJavaObject(null));
+    Assert.assertNull(oi.getPrimitiveWritableObject(null));
+
+    LocalDateTime local = LocalDateTime.of(2020, 1, 1, 0, 0, 1, 500);
+    Timestamp ts = Timestamp.valueOf("2020-01-01 00:00:01.00000050");

Review comment:
       sure

##########
File path: hive3/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergDateObjectInspectorHive3.java
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestIcebergDateObjectInspectorHive3 {
+
+  @Test
+  public void testIcebergDateObjectInspector() {
+    DateObjectInspector oi = IcebergDateObjectInspectorHive3.get();
+
+    Assert.assertEquals(ObjectInspector.Category.PRIMITIVE, oi.getCategory());
+    Assert.assertEquals(PrimitiveObjectInspector.PrimitiveCategory.DATE, oi.getPrimitiveCategory());
+
+    Assert.assertEquals(TypeInfoFactory.dateTypeInfo, oi.getTypeInfo());
+    Assert.assertEquals(TypeInfoFactory.dateTypeInfo.getTypeName(), oi.getTypeName());
+
+    Assert.assertEquals(Date.class, oi.getJavaPrimitiveClass());
+    Assert.assertEquals(DateWritableV2.class, oi.getPrimitiveWritableClass());
+
+    Assert.assertNull(oi.copyObject(null));
+    Assert.assertNull(oi.getPrimitiveJavaObject(null));
+    Assert.assertNull(oi.getPrimitiveWritableObject(null));
+
+    LocalDate local = LocalDate.of(2020, 1, 1);
+    Date date = Date.valueOf("2020-01-01");

Review comment:
       sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496947819



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -27,12 +27,25 @@
 import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.common.DynMethods;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public final class IcebergObjectInspector extends TypeUtil.SchemaVisitor<ObjectInspector> {
 
+  // get the correct inspectors depending on whether we're working with Hive2 or Hive3 dependencies
+  // we need to do this because there is a breaking API change in Date/TimestampObjectInspector between Hive2 and Hive3
+  private static final DynMethods.StaticMethod DATE_INSPECTOR = DynMethods.builder("get")
+      .impl("org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3", null)

Review comment:
       Is `null` needed? I think before, I've always just omitted the classes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   I suppose it might work, but I think it would give us more useful test coverage if the other iceberg subprojects, which `mr` depends on, are built using Hive3/Hadoop3 as well. For instance, most likely if the `mr` code uses Hive3, then the metastore it connects to will use Hive3 as well, so their interoperability should be tested. Let me know what you think.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and when this flag's enabled, only those modules are included that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although both Spark3 and Flink could be included at any point if the community agreed, since they already pass all the tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r493699459



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {

Review comment:
       nit: we can possibly make this a var which can be set  when we check `hive3Enabled` to either `:iceberg-mr-hive2` or `:iceberg-mr-hive3` . This would the  remove if/else branches, here and at line 441 also




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   I suppose it might work, but I think it would give us more useful test coverage if the other iceberg subprojects, which `mr` depends on (metastore, core, parquet, etc), are built using Hive3/Hadoop3 as well. For instance, most likely if the `mr` code uses Hive3, then the metastore it connects to will use Hive3 as well, so their interoperability should be tested. Let me know what you think.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and when this flag's enabled, only those modules are included that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although both Spark3 and Flink could be included at any point if the community agreed, since they already pass all the tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-701696521


   +1, we just need to get the tests working.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496952388



##########
File path: hive3/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergDateObjectInspectorHive3.java
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestIcebergDateObjectInspectorHive3 {
+
+  @Test
+  public void testIcebergDateObjectInspector() {
+    DateObjectInspector oi = IcebergDateObjectInspectorHive3.get();
+
+    Assert.assertEquals(ObjectInspector.Category.PRIMITIVE, oi.getCategory());
+    Assert.assertEquals(PrimitiveObjectInspector.PrimitiveCategory.DATE, oi.getPrimitiveCategory());
+
+    Assert.assertEquals(TypeInfoFactory.dateTypeInfo, oi.getTypeInfo());
+    Assert.assertEquals(TypeInfoFactory.dateTypeInfo.getTypeName(), oi.getTypeName());
+
+    Assert.assertEquals(Date.class, oi.getJavaPrimitiveClass());
+    Assert.assertEquals(DateWritableV2.class, oi.getPrimitiveWritableClass());
+
+    Assert.assertNull(oi.copyObject(null));
+    Assert.assertNull(oi.getPrimitiveJavaObject(null));
+    Assert.assertNull(oi.getPrimitiveWritableObject(null));
+
+    LocalDate local = LocalDate.of(2020, 1, 1);
+    Date date = Date.valueOf("2020-01-01");

Review comment:
       Is it possible to initialize these using days from epoch instead of parsing? I don't trust date/time objects that parse and translate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-701695790


   Looks like there are some tests failures:
   ```
   org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithHadoopCatalog > testScanTable FAILED
       java.lang.RuntimeException: Cannot start TestHiveMetastore
           Caused by:
           java.lang.RuntimeException: MetaException(message:Persistence Manager has been closed)
               Caused by:
               MetaException(message:Persistence Manager has been closed)
                   Caused by:
                   MetaException(message:Persistence Manager has been closed)
                       Caused by:
                       javax.jdo.JDOFatalUserException: Persistence Manager has been closed
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698033854


   Thanks for working on this, @marton-bod! And sorry for the delay in replying on this. I was initially focused on trying to avoid the problem of needing both hive 2 and hive 3 modules, but I don't see a way around it because the OI interfaces now specify incompatible objects. So I agree that we will need additional modules to handle this.
   
   But, I think there are some ways to simplify the changes this introduces. Because the changes needed between 2 and 3 are minor, I think the goal should be to produce a single iceberg-hive-runtime Jar that works in both versions. To do that, we need to build a `DateObjectInspector` for Hive 2 and one for Hive 3, but return the correct one at runtime using reflection. That way, we detect whether to use Hive 2 or 3 (e.g., by checking if `DateWritableV2` exists) and return an OI that matches. The other class would not be loaded, so it would not cause any issues.
   
   I think that we can achieve this using just one new module, iceberg-hive3, that adds the new object inspectors. The other module could continue to depend on Hive 2.
   
   I'd like to avoid selecting `versions.props` based on flags, so we would ideally just embed the Hive 3 version in the iceberg-hive3 dependency. That module should also depend on the iceberg-mr module so that it can run the same tests (maybe it can set the test source directory to share with hive2?). This module would have both hive 2 and hive 3 classes in its test classpath, so it would validate that having both doesn't break Hive. And, since Hadoop it always pulled in as a test dependency, it can use Hadoop 3.
   
   Finally, the iceberg-hive-runtime module would pull in both iceberg-hive and iceberg-hive3 so that all of the classes are in our runtime module.
   
   I think this would greatly simplify the support:
   1. It would add only one new module
   2. It would avoid using build flags
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698106959


   > Thanks for working on this, @marton-bod! And sorry for the delay in replying on this. I was initially focused on trying to avoid the problem of needing both hive 2 and hive 3 modules, but I don't see a way around it because the OI interfaces now specify incompatible objects. So I agree that we will need additional modules to handle this.
   > 
   > But, I think there are some ways to simplify the changes this introduces. Because the changes needed between 2 and 3 are minor, I think the goal should be to produce a single iceberg-hive-runtime Jar that works in both versions. To do that, we need to build a `DateObjectInspector` for Hive 2 and one for Hive 3, but return the correct one at runtime using reflection. That way, we detect whether to use Hive 2 or 3 (e.g., by checking if `DateWritableV2` exists) and return an OI that matches. The other class would not be loaded, so it would not cause any issues.
   > 
   > I think that we can achieve this using just one new module, iceberg-hive3, that adds the new object inspectors. The other module could continue to depend on Hive 2.
   > 
   > I'd like to avoid selecting `versions.props` based on flags, so we would ideally just embed the Hive 3 version in the iceberg-hive3 dependency. That module should also depend on the iceberg-mr module so that it can run the same tests (maybe it can set the test source directory to share with hive2?). This module would have both hive 2 and hive 3 classes in its test classpath, so it would validate that having both doesn't break Hive. And, since Hadoop it always pulled in as a test dependency, it can use Hadoop 3.
   > 
   > Finally, the iceberg-hive-runtime module would pull in both iceberg-hive and iceberg-hive3 so that all of the classes are in our runtime module.
   > 
   > I think this would greatly simplify the support:
   > 
   > 1. It would add only one new module
   > 2. It would avoid using build flags
   > 
   > What do you think?
   
   +1. This seems like a much cleaner approach if we can get this working!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r492380065



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspector extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {
+
+  private static final IcebergDateObjectInspector INSTANCE = new IcebergDateObjectInspector();
+
+  public static IcebergDateObjectInspector get() {
+    return INSTANCE;
+  }
+
+  private IcebergDateObjectInspector() {
+    super(TypeInfoFactory.dateTypeInfo);
+  }
+
+  @Override
+  public Date getPrimitiveJavaObject(Object o) {
+    if (o == null) {
+      return null;
+    }
+    LocalDate date = (LocalDate) o;
+    return Date.ofEpochDay((int) date.toEpochDay());

Review comment:
       Good point, thanks, I will change to use the same method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496946840



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -27,12 +27,25 @@
 import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.common.DynMethods;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public final class IcebergObjectInspector extends TypeUtil.SchemaVisitor<ObjectInspector> {
 
+  // get the correct inspectors depending on whether we're working with Hive2 or Hive3 dependencies
+  // we need to do this because there is a breaking API change in Date/TimestampObjectInspector between Hive2 and Hive3
+  private static final DynMethods.StaticMethod DATE_INSPECTOR = DynMethods.builder("get")
+      .impl("org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3", null)
+      .impl(IcebergDateObjectInspector.class, null)

Review comment:
       It is safer to use the class name string instead of the class here, so that we know it won't be loaded and break at runtime. I think that the Hive 3 tests will exercise this case, but it would still be easier to be careful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r491957949



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde")
+    }
+  }
+} else {
+  project(':iceberg-mr-hive3') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde:3.1.2")

Review comment:
       Thanks a lot for checking @massdosage! Indeed, it's already set in versions-hive3, so I'll remove this. 

##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspector extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {
+
+  private static final IcebergDateObjectInspector INSTANCE = new IcebergDateObjectInspector();
+
+  public static IcebergDateObjectInspector get() {
+    return INSTANCE;
+  }
+
+  private IcebergDateObjectInspector() {
+    super(TypeInfoFactory.dateTypeInfo);
+  }
+
+  @Override
+  public Date getPrimitiveJavaObject(Object o) {
+    if (o == null) {
+      return null;
+    }
+    LocalDate date = (LocalDate) o;
+    return Date.ofEpochDay((int) date.toEpochDay());

Review comment:
       Good point, thanks, I will change to use the same method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-700143304


   @rdblue @rdsr @massdosage 
   Thanks a lot for your very valuable comments from last week!
   I have reworked the solution according to the proposal by @rdblue, and it came out simpler and better I think. I would greatly appreciate it if you could take a look at it once again. 
   Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   In order to build iceberg-mr/iceberg-mr-hive3, we need to build the other iceberg subprojects with Hive3/Hadoop3 as well which mr depends on, such as hive-metastore, core, parquet, etc. For example, there is an HMSHandler API change between Hive2 and Hive3, therefore if we didn't build the hive-metastore module with Hive3, mr (with Hive3) would not be able to communicate with it.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and when this flag's enabled, only those modules are included that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although they can also be added by the community whenever suitable)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r491957949



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde")
+    }
+  }
+} else {
+  project(':iceberg-mr-hive3') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde:3.1.2")

Review comment:
       Thanks a lot for checking @massdosage! Indeed, it's already set in versions-hive3, so I'll remove this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698497777


   > there is a breaking change in the metastore API between Hive2 and 3
   
   What was the incompatibility? Ideally, we will handle it with reflection to avoid needing a different module.
   
   > if the hive2 specific parts are not factored out from iceberg-mr, when iceberg-mr-hive3 pulls that in as a dependency, but using Hive3, those couple of ObjectInspector classes would not compile
   
   The classes are already compiled. We just have to avoid loading them. So we would use different class names for the inspectors between 2 and 3 and load the correct one using reflection depending on whether the detected Hive version in 2 or 3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-704961465


   @rdblue Thanks for the suggestion! As per that, I've changed the code to create a metastore instance only once per test class to make things faster.
   
   The intermittent test failure was due to the fact that, in Hive3, the hive conf properties you set on the HiveRunner shell were not passed down from HiveRunner to all the threads spawned inside it (e.g. `HiveMaterializedViewsRegistry`) during HS2 initialization. This meant that these threads were calling`ObjectStore#setConf` with a different set of jdo properties than the rest, leading to the closure and replacement of the global `ObjectStore#PersistenceManagerFactory` instance whenever a config change was detected (-- resulting in intermittent "Persistence Manager has been closed" errors for those threads that would still be using their cached, but now closed PMs). Fortunately, during `HiveConf` construction, all the system properties are surveyed and included so we can ensure that the `metastore_uris` property is properly propagated and get the desired behaviour.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r492325706



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspector extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {
+
+  private static final IcebergDateObjectInspector INSTANCE = new IcebergDateObjectInspector();
+
+  public static IcebergDateObjectInspector get() {
+    return INSTANCE;
+  }
+
+  private IcebergDateObjectInspector() {
+    super(TypeInfoFactory.dateTypeInfo);
+  }
+
+  @Override
+  public Date getPrimitiveJavaObject(Object o) {
+    if (o == null) {
+      return null;
+    }
+    LocalDate date = (LocalDate) o;
+    return Date.ofEpochDay((int) date.toEpochDay());

Review comment:
       Why use `toEpochDay` here, but `daysFromDate` in the other method? I think both should use `daysFromDate` since that's what we use elsewhere and trust the most.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496945024



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private static Boolean hive3PresentOnClasspath = null;

Review comment:
       This could be `private static final` if you used a `private static` method to set it initially, I think:
   
   ```java
   private static final boolean HIVE3_PRESENT_ON_CLASSPATH = detectHive3();
   
   public static boolean hive3PresentOnClasspath() {'
     return HIVE3_PRESENT_ON_CLASSPATH;
   }
   
   private static boolean detectHive3() {
     try {
       Class.forName(...);
       return true;
     } catch (ClassNotFoundException e) {
       return false;
     }
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r493817589



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {

Review comment:
       thanks, yes, that could be a good idea (as long as their dependencies remain the same)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695769420


   Thanks @rdsr for sharing your thoughts!
   1.) I agree with you, hive-metastore should be left as is. I haven't changed the hive-metastore module other than fixing the teardown problem in the unit tests, which occurs when using hive 3 dependencies (discussed in https://github.com/apache/iceberg/pull/1455).
   2.) I think I've done what you just described, only under different module names. Right now, common code resides in `iceberg-mr`, and incompatible codes have been moved out to `iceberg-mr-hive2` and `icebegr-mr-hive3`. I'm open to renaming the modules to `exec`, sounds like a more suitable name. As for shipping different runtime jars, we might want to create separate runtime modules too just like for spark2 and 3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496845233



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {
+  project(':iceberg-mr-hive3') {
+
+    // run the tests in iceberg-mr with Hive3 dependencies
+    sourceSets {
+      test {
+        java.srcDirs = ['../mr/src/test/java', 'src/test/java']
+        resources.srcDirs = ['../mr/src/test/resources', 'src/test/resources']
+      }
+    }
+
+    // exclude these Hive2-specific tests from iceberg-mr
+    test {
+      exclude '**/TestIcebergDateObjectInspector.class'
+      exclude '**/TestIcebergTimestampObjectInspector.class'
+    }
+
+    configurations.all {
+      resolutionStrategy.eachDependency { dep ->
+        if (dep.requested.group == 'org.apache.hive' && dep.requested.name != 'hive-storage-api') {
+          dep.useVersion '3.1.2'
+        } else if (dep.requested.group == 'org.apache.hadoop') {
+          dep.useVersion '3.1.0'
+        }
+      }
+    }

Review comment:
       yes, you're right, don't need. I've set the versions explicitly instead

##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {
+  project(':iceberg-mr-hive3') {
+
+    // run the tests in iceberg-mr with Hive3 dependencies
+    sourceSets {
+      test {
+        java.srcDirs = ['../mr/src/test/java', 'src/test/java']
+        resources.srcDirs = ['../mr/src/test/resources', 'src/test/resources']
+      }
+    }
+
+    // exclude these Hive2-specific tests from iceberg-mr
+    test {
+      exclude '**/TestIcebergDateObjectInspector.class'
+      exclude '**/TestIcebergTimestampObjectInspector.class'
+    }
+
+    configurations.all {
+      resolutionStrategy.eachDependency { dep ->
+        if (dep.requested.group == 'org.apache.hive' && dep.requested.name != 'hive-storage-api') {
+          dep.useVersion '3.1.2'
+        } else if (dep.requested.group == 'org.apache.hadoop') {
+          dep.useVersion '3.1.0'
+        }
+      }
+    }

Review comment:
       yes, you're right, don't need that block. I've set the versions explicitly instead




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496948397



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergObjectInspector.java
##########
@@ -90,7 +91,13 @@ public void testIcebergObjectInspector() {
     Assert.assertEquals(3, dateField.getFieldID());
     Assert.assertEquals("date_field", dateField.getFieldName());
     Assert.assertEquals("date comment", dateField.getFieldComment());
-    Assert.assertEquals(IcebergDateObjectInspector.get(), dateField.getFieldObjectInspector());
+    if (MetastoreUtil.hive3PresentOnClasspath()) {
+      Assert.assertEquals(
+              "org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3",
+              dateField.getFieldObjectInspector().getClass().getName());

Review comment:
       I'd prefer using the same assertion, but with a different class name. But this is really minor if tests are passing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497494123



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -27,12 +27,25 @@
 import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.common.DynMethods;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public final class IcebergObjectInspector extends TypeUtil.SchemaVisitor<ObjectInspector> {
 
+  // get the correct inspectors depending on whether we're working with Hive2 or Hive3 dependencies
+  // we need to do this because there is a breaking API change in Date/TimestampObjectInspector between Hive2 and Hive3
+  private static final DynMethods.StaticMethod DATE_INSPECTOR = DynMethods.builder("get")
+      .impl("org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3", null)

Review comment:
       you're right, no need for the null




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695809197


   Thanks @marton-bod for your comment. Is it possible, then, to only have hive3 dependencies under `iceberg-mr-hive3` instead of building other modules too with hive3 and hadoop3?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496310879



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java
##########
@@ -41,7 +41,16 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected HiveMetaStoreClient newClient()  {
     try {
-      return new HiveMetaStoreClient(hiveConf);
+      // create the metastore client based on whether we're working with Hive2 or Hive3 dependencies
+      // we need to do this because there is a breaking API change between Hive2 and Hive3
+      if (MetastoreUtil.hive3PresentOnClasspath()) {
+        return (HiveMetaStoreClient) Class
+                .forName(HiveMetaStoreClient.class.getName())
+                .getConstructor(Configuration.class)
+                .newInstance(hiveConf);

Review comment:
       Can you use `DynCtors` instead? I think this can be simpler:
   
   ```java
     private static final DynConstructors.Ctor<HiveMetaStoreClient> CLIENT_CTOR = DynConstructors.builder()
         .impl(HiveMetaStoreClient.class, HiveConf.class)
         .impl(HiveMetaStoreClient.class, Configuration.class)
         .build();
   
     protected HiveMetaStoreClient newClient()  {
       try {
         return CLIENT_CTOR.newInstance(hiveConf);
       } catch (...) {
         ...
       }
     }
   ```
   
   This also exposes a bug with the reflection path: `MetaException` is no longer thrown in the `try` block. Since that's a checked exception, it will be wrapped in a `RuntimeException`. You'll need to replace that block with `catch (RuntimeException e) { ... }` and check the cause of the exception for a `MetaException`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   In order to build `iceberg-mr`/`iceberg-mr-hive3`, we need to build the other iceberg subprojects with Hive3/Hadoop3 as well which `mr` depends on, such as `hive-metastore`, `core`, `parquet`, etc. For example, there is a metastore client API change between Hive2 and Hive3, therefore if we didn't build the `hive-metastore` module with Hive3, `mr` (with Hive3) would not be able to communicate with it.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and only a subset of modules will be built when this flag is enabled, only those modules that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although both Spark3 and Flink could be included at any point if the community agreed, since they pass all the tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497852929



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergObjectInspector.java
##########
@@ -90,7 +91,13 @@ public void testIcebergObjectInspector() {
     Assert.assertEquals(3, dateField.getFieldID());
     Assert.assertEquals("date_field", dateField.getFieldName());
     Assert.assertEquals("date comment", dateField.getFieldComment());
-    Assert.assertEquals(IcebergDateObjectInspector.get(), dateField.getFieldObjectInspector());
+    if (MetastoreUtil.hive3PresentOnClasspath()) {
+      Assert.assertEquals(
+              "org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3",
+              dateField.getFieldObjectInspector().getClass().getName());

Review comment:
       Nit: indentation is off.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496951903



##########
File path: hive3/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergTimestampObjectInspectorHive3.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDateTime;
+import java.time.OffsetDateTime;
+import java.time.ZoneOffset;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.TimestampObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestIcebergTimestampObjectInspectorHive3 {
+
+  @Test
+  public void testIcebergTimestampObjectInspector() {
+    TimestampObjectInspector oi = IcebergTimestampObjectInspectorHive3.get(false);
+
+    Assert.assertEquals(ObjectInspector.Category.PRIMITIVE, oi.getCategory());
+    Assert.assertEquals(PrimitiveObjectInspector.PrimitiveCategory.TIMESTAMP, oi.getPrimitiveCategory());
+
+    Assert.assertEquals(TypeInfoFactory.timestampTypeInfo, oi.getTypeInfo());
+    Assert.assertEquals(TypeInfoFactory.timestampTypeInfo.getTypeName(), oi.getTypeName());
+
+    Assert.assertEquals(Timestamp.class, oi.getJavaPrimitiveClass());
+    Assert.assertEquals(TimestampWritableV2.class, oi.getPrimitiveWritableClass());
+
+    Assert.assertNull(oi.copyObject(null));
+    Assert.assertNull(oi.getPrimitiveJavaObject(null));
+    Assert.assertNull(oi.getPrimitiveWritableObject(null));
+
+    LocalDateTime local = LocalDateTime.of(2020, 1, 1, 0, 0, 1, 500);
+    Timestamp ts = Timestamp.valueOf("2020-01-01 00:00:01.00000050");

Review comment:
       We should use instants to initialize all date/time values so that we are not hit by time zone bugs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496734032



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Unfortunately Hive3 only works with Java8. There are a number of classloader related issues when running Hive3 on Java11.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-700809758


   Thanks a lot for your review @rdblue, appreciate it! I agree with your comments and have made the improvements.
   Good idea to use `DynMethods`/`DynConstructors`, quite handy!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r492380065



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspector extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {
+
+  private static final IcebergDateObjectInspector INSTANCE = new IcebergDateObjectInspector();
+
+  public static IcebergDateObjectInspector get() {
+    return INSTANCE;
+  }
+
+  private IcebergDateObjectInspector() {
+    super(TypeInfoFactory.dateTypeInfo);
+  }
+
+  @Override
+  public Date getPrimitiveJavaObject(Object o) {
+    if (o == null) {
+      return null;
+    }
+    LocalDate date = (LocalDate) o;
+    return Date.ofEpochDay((int) date.toEpochDay());

Review comment:
       Good point, thanks, I will change to use the same method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496861453



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {

Review comment:
       Yeah, what can I say... 😢  The upgrade work is ongoing and I believe is tracked here: https://issues.apache.org/jira/browse/HIVE-22415




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496311031



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private MetastoreUtil() {
+

Review comment:
       No need for this empty line. Could you remove it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496844294



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+public class MetastoreUtil {
+
+  // this class is unique to Hive3 and cannot be found in Hive2, therefore a good proxy to see if
+  // we are working against Hive3 dependencies
+  private static final String HIVE3_UNIQUE_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2";
+
+  private MetastoreUtil() {
+
+  }
+
+  /**
+   * @return true if Hive3 dependencies are found on classpath, false otherwise
+   */
+  public static boolean hive3PresentOnClasspath() {

Review comment:
       Agree, create a static variable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r492325706



##########
File path: mr-hive3/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive.serde.objectinspector;
+
+import java.time.LocalDate;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.serde2.io.DateWritableV2;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.iceberg.util.DateTimeUtil;
+
+public final class IcebergDateObjectInspector extends AbstractPrimitiveJavaObjectInspector
+                                              implements DateObjectInspector {
+
+  private static final IcebergDateObjectInspector INSTANCE = new IcebergDateObjectInspector();
+
+  public static IcebergDateObjectInspector get() {
+    return INSTANCE;
+  }
+
+  private IcebergDateObjectInspector() {
+    super(TypeInfoFactory.dateTypeInfo);
+  }
+
+  @Override
+  public Date getPrimitiveJavaObject(Object o) {
+    if (o == null) {
+      return null;
+    }
+    LocalDate date = (LocalDate) o;
+    return Date.ofEpochDay((int) date.toEpochDay());

Review comment:
       Why use `toEpochDay` here, but `daysFromDate` in the other method? I think both should use `daysFromDate` since that's what we use elsewhere and trust the most.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-705065368


   Thanks for all your hard work on this, @marton-bod! Great to have support for Hive 3 now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-695824861


   I suppose it might work, but I think it would give us more useful test coverage if the other iceberg subprojects, which `mr` depends on, are built using Hive3/Hadoop3 as well. For instance, most likely if the `mr` code uses Hive3, then the metastore it connects to will use Hive3 as well, so their interoperability should be tested. Let me know what you think.
   
   However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the `-Dhive3` flag to the build, will Hive3/Hadoop3 dependencies be used (and with this flag is enabled, only those modules are included that are ready for the Hive3-build - i.e. Spark and Flink excluded for now, although both Spark3 and Flink could be included at any point if the community agreed, since they already pass all the tests)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-702191761


   Thanks. I'm looking into the flaky test, it still occurs intermittently.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496946840



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -27,12 +27,25 @@
 import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.common.DynMethods;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public final class IcebergObjectInspector extends TypeUtil.SchemaVisitor<ObjectInspector> {
 
+  // get the correct inspectors depending on whether we're working with Hive2 or Hive3 dependencies
+  // we need to do this because there is a breaking API change in Date/TimestampObjectInspector between Hive2 and Hive3
+  private static final DynMethods.StaticMethod DATE_INSPECTOR = DynMethods.builder("get")
+      .impl("org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3", null)
+      .impl(IcebergDateObjectInspector.class, null)

Review comment:
       It is safer to use the class name string instead of the class here, so that we know it won't be loaded and break at runtime.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r497494030



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergObjectInspector.java
##########
@@ -90,7 +91,13 @@ public void testIcebergObjectInspector() {
     Assert.assertEquals(3, dateField.getFieldID());
     Assert.assertEquals("date_field", dateField.getFieldName());
     Assert.assertEquals("date comment", dateField.getFieldComment());
-    Assert.assertEquals(IcebergDateObjectInspector.get(), dateField.getFieldObjectInspector());
+    if (MetastoreUtil.hive3PresentOnClasspath()) {
+      Assert.assertEquals(
+              "org.apache.iceberg.mr.hive.serde.objectinspector.IcebergDateObjectInspectorHive3",
+              dateField.getFieldObjectInspector().getClass().getName());

Review comment:
       agree




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r491911188



##########
File path: build.gradle
##########
@@ -458,14 +471,29 @@ project(':iceberg-mr') {
     testCompile("org.apache.calcite:calcite-core")
     testCompile("com.esotericsoftware:kryo-shaded:4.0.2")
     testCompile("com.fasterxml.jackson.core:jackson-annotations:2.6.5")
-    testCompile("com.klarna:hiverunner:5.2.1") {
+    testCompile("com.klarna:hiverunner") {
       exclude group: 'javax.jms', module: 'jms'
       exclude group: 'org.apache.hive', module: 'hive-exec'
       exclude group: 'org.codehaus.jettison', module: 'jettison'
       exclude group: 'org.apache.calcite.avatica'
     }
   }
 }
+if (!gradle.ext.hive3Enabled) {
+  project(':iceberg-mr-hive2') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde")
+    }
+  }
+} else {
+  project(':iceberg-mr-hive3') {
+    dependencies {
+      compile project(':iceberg-core')
+      compileOnly("org.apache.hive:hive-serde:3.1.2")

Review comment:
       Is this version not set in `versions-hive3.props` already? Or is there some reason you need to explicitly set it here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496312063



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
##########
@@ -94,8 +104,24 @@ public String getDatabasePath(String dbName) {
   private TServer newThriftServer(TServerSocket socket, HiveConf conf) throws Exception {
     HiveConf serverConf = new HiveConf(conf);
     serverConf.set(HiveConf.ConfVars.METASTORECONNECTURLKEY.varname, "jdbc:derby:" + getDerbyPath() + ";create=true");
-    HiveMetaStore.HMSHandler baseHandler = new HiveMetaStore.HMSHandler("new db based metaserver", serverConf);
-    IHMSHandler handler = RetryingHMSHandler.getProxy(serverConf, baseHandler, false);
+
+    // create the metastore handlers based on whether we're working with Hive2 or Hive3 dependencies
+    // we need to do this because there is a breaking API change between Hive2 and Hive3
+    HiveMetaStore.HMSHandler baseHandler;
+    IHMSHandler handler;
+    if (MetastoreUtil.hive3PresentOnClasspath()) {
+      baseHandler = (HiveMetaStore.HMSHandler) Class
+              .forName(HiveMetaStore.HMSHandler.class.getName())
+              .getConstructor(String.class, Configuration.class)
+              .newInstance("new db based metaserver", serverConf);
+      handler = (IHMSHandler) Class
+              .forName(RetryingHMSHandler.class.getName())
+              .getDeclaredMethod("getProxy", Configuration.class, IHMSHandler.class, boolean.class)
+              .invoke(null, serverConf, baseHandler, false);
+    } else {
+      baseHandler = new HiveMetaStore.HMSHandler("new db based metaserver", serverConf);
+      handler = RetryingHMSHandler.getProxy(serverConf, baseHandler, false);
+    }

Review comment:
       This is also a good place to use the reflection helpers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-701466352


   Thanks @rdblue once again for the improvement suggestions - I've address them.
   As for JDK11, we do have test failures unfortunately (https://github.com/apache/iceberg/pull/1478#discussion_r497471538) therefore we'll need to go with JDK8 only for Hive3.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496313414



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -72,7 +75,20 @@ public ObjectInspector primitive(Type.PrimitiveType primitiveType) {
         primitiveTypeInfo = TypeInfoFactory.booleanTypeInfo;
         break;
       case DATE:
-        return IcebergDateObjectInspector.get();
+        // create the correct inspector based on whether we're working with Hive2 or Hive3 dependencies
+        // we need to do this because there is a breaking API change in DateObjectInspector between Hive2 and Hive3
+        if (MetastoreUtil.hive3PresentOnClasspath()) {

Review comment:
       And, translating these to use `DynMethods` would make it cleaner. I think both paths should use reflection and that's easy with `DynMethods`. You just provide multiple `impl` calls and the first one that is found will be used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496844697



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -72,7 +75,20 @@ public ObjectInspector primitive(Type.PrimitiveType primitiveType) {
         primitiveTypeInfo = TypeInfoFactory.booleanTypeInfo;
         break;
       case DATE:
-        return IcebergDateObjectInspector.get();
+        // create the correct inspector based on whether we're working with Hive2 or Hive3 dependencies
+        // we need to do this because there is a breaking API change in DateObjectInspector between Hive2 and Hive3
+        if (MetastoreUtil.hive3PresentOnClasspath()) {

Review comment:
       yep, makes sense

##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {
+  project(':iceberg-mr-hive3') {

Review comment:
       sure, renamed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496313105



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
##########
@@ -72,7 +75,20 @@ public ObjectInspector primitive(Type.PrimitiveType primitiveType) {
         primitiveTypeInfo = TypeInfoFactory.booleanTypeInfo;
         break;
       case DATE:
-        return IcebergDateObjectInspector.get();
+        // create the correct inspector based on whether we're working with Hive2 or Hive3 dependencies
+        // we need to do this because there is a breaking API change in DateObjectInspector between Hive2 and Hive3
+        if (MetastoreUtil.hive3PresentOnClasspath()) {

Review comment:
       Because the object inspector is a singleton, this could be done just once to set a static field. Then all this would need to do is return the `IcebergObjectInspector.DATE_INSPECTOR`. I think that would be better than running reflect code every invocation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-702278857


   @marton-bod, we were just talking about test metastores on #1495: https://github.com/apache/iceberg/pull/1495#discussion_r498394239
   
   I think part of the problem is that this is creating a new metastore instance for each test case. That's going to take longer and doesn't catch connection leaks. That's probably also causing the issue here, where something isn't cleaned up properly. I recommend moving Metastore setup to a `@BeforeClass`, like in [`SparkTestBase`](https://github.com/apache/iceberg/blob/d8a6f7fd19025005df75e06bfa5ab7417aad803a/spark/src/test/java/org/apache/iceberg/spark/SparkTestBase.java#L51).
   
   I think that would address the issue here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496313735



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {
+  project(':iceberg-mr-hive3') {

Review comment:
       Can we rename this to `:iceberg-hive3`? I don't think that we need the `mr` part because this doesn't really contain MR classes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-697256274


   @rdblue @massdosage @rdsr 
   Do you guys think we can go ahead with this change to enable Hive 3 builds? 
   Let me know if you have any suggestions, I'm open to discussion and changes. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-700901054


   Thank you, @marton-bod! Looking good but there are a few more issues. Mostly, I'd prefer not to parse values in tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#issuecomment-698106959


   > Thanks for working on this, @marton-bod! And sorry for the delay in replying on this. I was initially focused on trying to avoid the problem of needing both hive 2 and hive 3 modules, but I don't see a way around it because the OI interfaces now specify incompatible objects. So I agree that we will need additional modules to handle this.
   > 
   > But, I think there are some ways to simplify the changes this introduces. Because the changes needed between 2 and 3 are minor, I think the goal should be to produce a single iceberg-hive-runtime Jar that works in both versions. To do that, we need to build a `DateObjectInspector` for Hive 2 and one for Hive 3, but return the correct one at runtime using reflection. That way, we detect whether to use Hive 2 or 3 (e.g., by checking if `DateWritableV2` exists) and return an OI that matches. The other class would not be loaded, so it would not cause any issues.
   > 
   > I think that we can achieve this using just one new module, iceberg-hive3, that adds the new object inspectors. The other module could continue to depend on Hive 2.
   > 
   > I'd like to avoid selecting `versions.props` based on flags, so we would ideally just embed the Hive 3 version in the iceberg-hive3 dependency. That module should also depend on the iceberg-mr module so that it can run the same tests (maybe it can set the test source directory to share with hive2?). This module would have both hive 2 and hive 3 classes in its test classpath, so it would validate that having both doesn't break Hive. And, since Hadoop it always pulled in as a test dependency, it can use Hadoop 3.
   > 
   > Finally, the iceberg-hive-runtime module would pull in both iceberg-hive and iceberg-hive3 so that all of the classes are in our runtime module.
   > 
   > I think this would greatly simplify the support:
   > 
   > 1. It would add only one new module
   > 2. It would avoid using build flags
   > 
   > What do you think?
   
   +1. This seems like a much cleaner approach if we can get this working!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1478: Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1478:
URL: https://github.com/apache/iceberg/pull/1478#discussion_r496314459



##########
File path: build.gradle
##########
@@ -468,6 +468,79 @@ project(':iceberg-mr') {
   }
 }
 
+if (jdkVersion == '8') {
+  project(':iceberg-mr-hive3') {
+
+    // run the tests in iceberg-mr with Hive3 dependencies
+    sourceSets {
+      test {
+        java.srcDirs = ['../mr/src/test/java', 'src/test/java']
+        resources.srcDirs = ['../mr/src/test/resources', 'src/test/resources']
+      }
+    }
+
+    // exclude these Hive2-specific tests from iceberg-mr
+    test {
+      exclude '**/TestIcebergDateObjectInspector.class'
+      exclude '**/TestIcebergTimestampObjectInspector.class'
+    }
+
+    configurations.all {
+      resolutionStrategy.eachDependency { dep ->
+        if (dep.requested.group == 'org.apache.hive' && dep.requested.name != 'hive-storage-api') {
+          dep.useVersion '3.1.2'
+        } else if (dep.requested.group == 'org.apache.hadoop') {
+          dep.useVersion '3.1.0'
+        }
+      }
+    }

Review comment:
       Is this block needed? I think we should be able to explicitly set dependency versions in the `dependencies` block, like this:
   
   ```
   compileOnly("org.apache.hive:hive-exec:3.1.2:core") {
     ...
   }
   
   compileOnly("org.apache.hadoop:hadoop-client:3.1.0") {
     ...
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org