You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/26 15:06:32 UTC

[GitHub] [iceberg] pvary opened a new pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

pvary opened a new pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278


   During #2254 we postponed to create tests for tables where the column type is timestamp.withZone().
   This PR adds them with the appropriate tests.
   
   The premise:
   - Hive2 uses sql.Timestamp internally - which is in the LocalTimezone
   - Hive3 uses hive.Timestamp internally - which is in UTC
   - Iceberg timestamp.withZone() stores the values in OffsetDateTime in UTC
   
   What I have done:
   - Fixed the conversions for `IcebergTimestampWithZoneObjectInspector`
   - Simplified a code a little bit for `IcebergTimestampObjectInspectorHive3`
   - Added the junit tests for `SELECT`
      - Had to  add one more magic setting so the TimeZone is handled correctly
   - Added additional check to the existing unit tests to make sure that we get back the same string in SQL as we insert
   - Had to change the unit tests for Hive2, so the expected value in the Iceberg record is considers the TimeZone


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584674839



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       Why is it a lazy implementation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584695422



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       Ah I see. I think I misunderstood it as something to do with lazy loading/initialization...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584738425



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       Updated the comment, could you please check?
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584683343



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       No ordering is implemented which should be done in a fool-proof implementation.
   Expects the records in a given order




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584735576



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.
+   * @param shell The shell used for executing the query
+   * @param tableName The name of the table to query
+   * @param expected The records we inserted
+   */
+  public static void validateDataWithSql(TestHiveShell shell, String tableName, List<Record> expected) {
+    List<Object[]> actual = shell.executeStatement("SELECT * from " + tableName);
+
+    for (int rowId = 0; rowId < expected.size(); ++rowId) {
+      Record record = expected.get(rowId);
+      Object[] row = actual.get(rowId);
+      Assert.assertEquals(record.size(), row.length);
+      for (int fieldId = 0; fieldId < record.size(); ++fieldId) {
+        Types.NestedField field = record.struct().fields().get(fieldId);
+        String inserted =
+            getStringValueForInsert(record.getField(field.name()), field.type()).replaceAll("'(.*)'", "$1");

Review comment:
       Fixed, could you please check the comment?
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584786430



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       Looks good, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584663089



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.

Review comment:
       nit: typo 'completeness sake`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584734792



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       fixed, thanks

##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.

Review comment:
       fixed, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584837052



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.
+   * @param shell The shell used for executing the query
+   * @param tableName The name of the table to query
+   * @param expected The records we inserted
+   */
+  public static void validateDataWithSql(TestHiveShell shell, String tableName, List<Record> expected) {
+    List<Object[]> actual = shell.executeStatement("SELECT * from " + tableName);
+
+    for (int rowId = 0; rowId < expected.size(); ++rowId) {
+      Record record = expected.get(rowId);
+      Object[] row = actual.get(rowId);
+      Assert.assertEquals(record.size(), row.length);
+      for (int fieldId = 0; fieldId < record.size(); ++fieldId) {
+        Types.NestedField field = record.struct().fields().get(fieldId);
+        String inserted =
+            getStringValueForInsert(record.getField(field.name()), field.type()).replaceAll("'(.*)'", "$1");

Review comment:
       Changed the formatting as you have suggested.
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584678861



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergTimestampWithZoneObjectInspector.java
##########
@@ -42,12 +42,13 @@ private IcebergTimestampWithZoneObjectInspector() {
 
   @Override
   public OffsetDateTime convert(Object o) {
-    return o == null ? null : OffsetDateTime.ofInstant(((Timestamp) o).toInstant(), ZoneOffset.UTC);
+    return o == null ? null : OffsetDateTime.of(((Timestamp) o).toLocalDateTime(), ZoneOffset.UTC);
   }
 
   @Override
   public Timestamp getPrimitiveJavaObject(Object o) {
-    return o == null ? null : Timestamp.from(((OffsetDateTime) o).toInstant());
+    return o == null ? null :

Review comment:
       Just to clarify: I see that only the hive2 withZone object inspector is changed. Does that mean that the predicate pushdown problem only occurred on hive2?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584687461



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergTimestampWithZoneObjectInspector.java
##########
@@ -42,12 +42,13 @@ private IcebergTimestampWithZoneObjectInspector() {
 
   @Override
   public OffsetDateTime convert(Object o) {
-    return o == null ? null : OffsetDateTime.ofInstant(((Timestamp) o).toInstant(), ZoneOffset.UTC);
+    return o == null ? null : OffsetDateTime.of(((Timestamp) o).toLocalDateTime(), ZoneOffset.UTC);
   }
 
   @Override
   public Timestamp getPrimitiveJavaObject(Object o) {
-    return o == null ? null : Timestamp.from(((OffsetDateTime) o).toInstant());
+    return o == null ? null :

Review comment:
       With Hive3 we can not convert the Hive filter to an Iceberg filter so we do not push it down with Hive3:
   
   ```
   2021-03-01T13:49:33,168 WARN  [f0a11142-f200-4819-b157-eebcff2d39ea Test worker] hive.HiveIcebergInputFormat (HiveIcebergInputFormat.java:getSplits(64)) - Unable to create Iceberg filter, continuing without filter (will be applied by Hive later): 
   java.lang.UnsupportedOperationException: CONSTANT operator is not supported
   	at org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.translate(HiveIcebergFilterFactory.java:85) ~[iceberg-mr-1e4412e.dirty.jar:?]
   	at org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.generateFilterExpression(HiveIcebergFilterFactory.java:56) ~[iceberg-mr-1e4412e.dirty.jar:?]
   	at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:61) [iceberg-mr-1e4412e.dirty.jar:?]
   	at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2691) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) [hive-exec-3.1.2-core.jar:3.1.2]
   	at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:460) [hive-service-3.1.2.jar:3.1.2]
   	at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:309) [hive-service-3.1.2.jar:3.1.2]
   	at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:905) [hive-service-3.1.2.jar:3.1.2]
   	at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:561) [hive-service-3.1.2.jar:3.1.2]
   	at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:553) [hive-service-3.1.2.jar:3.1.2]
   	at org.apache.iceberg.mr.hive.TestHiveShell.executeStatement(TestHiveShell.java:143) [test/:?]
   	at org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerTimezone.testTimestampTzQuery(TestHiveIcebergStorageHandlerTimezone.java:214) [test/:?]
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_172]
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_172]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_172]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_172]
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) [junit-4.12.jar:4.12]
   	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) [junit-4.12.jar:4.12]
   	at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
   	at org.junit.runners.Suite.runChild(Suite.java:128) [junit-4.12.jar:4.12]
   	at org.junit.runners.Suite.runChild(Suite.java:27) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) [junit-4.12.jar:4.12]
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) [junit-4.12.jar:4.12]
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) [gradle-testing-jvm-5.4.1.jar:5.4.1]
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) [gradle-testing-jvm-5.4.1.jar:5.4.1]
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) [gradle-testing-jvm-5.4.1.jar:5.4.1]
   	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) [gradle-testing-jvm-5.4.1.jar:5.4.1]
   	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) [gradle-testing-base-5.4.1.jar:5.4.1]
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_172]
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_172]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_172]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_172]
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) [gradle-messaging-5.4.1.jar:5.4.1]
   	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) [?:?]
   	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) [gradle-testing-base-5.4.1.jar:5.4.1]
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_172]
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_172]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_172]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_172]
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) [gradle-messaging-5.4.1.jar:5.4.1]
   	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) [gradle-base-services-5.4.1.jar:5.4.1]
   	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) [gradle-base-services-5.4.1.jar:5.4.1]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]
   	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) [gradle-base-services-5.4.1.jar:5.4.1]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
   ```
   
   I fully expect that we have to adjust differently for Hive3 and Hive2 when we fix this. But I think this should be handled in a different PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584737620



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -323,14 +323,16 @@ public void testInsertSupportedTypes() throws IOException {
         continue;
       }
       String columnName = type.typeId().toString().toLowerCase() + "_column";
+      String tableName = type.typeId().toString().toLowerCase() + "_table_" + i;
 
       Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, columnName, type));
       List<Record> expected = TestHelper.generateRandomRecords(schema, 5, 0L);
 
-      Table table = testTables.createTable(shell, type.typeId().toString().toLowerCase() + "_table_" + i,
-          schema, PartitionSpec.unpartitioned(), fileFormat, expected);
+      Table table = testTables.createTable(shell, tableName, schema, PartitionSpec.unpartitioned(), fileFormat,
+          expected);
 
       HiveIcebergTestUtils.validateData(table, expected, 0);

Review comment:
       updated javadoc, and renamed the method




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584664222



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.
+   * @param shell The shell used for executing the query
+   * @param tableName The name of the table to query
+   * @param expected The records we inserted
+   */
+  public static void validateDataWithSql(TestHiveShell shell, String tableName, List<Record> expected) {
+    List<Object[]> actual = shell.executeStatement("SELECT * from " + tableName);
+
+    for (int rowId = 0; rowId < expected.size(); ++rowId) {
+      Record record = expected.get(rowId);
+      Object[] row = actual.get(rowId);
+      Assert.assertEquals(record.size(), row.length);
+      for (int fieldId = 0; fieldId < record.size(); ++fieldId) {
+        Types.NestedField field = record.struct().fields().get(fieldId);
+        String inserted =
+            getStringValueForInsert(record.getField(field.name()), field.type()).replaceAll("'(.*)'", "$1");

Review comment:
       Can you please provide some explanatory comments to the regex logic `.replaceAll("'(.*)'", "$1")`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584662744



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -107,6 +113,13 @@
               PrimitiveObjectInspectorFactory.writableStringObjectInspector
           ));
 
+  public static final DateTimeFormatter timestampWithTZFormatter = new DateTimeFormatterBuilder()

Review comment:
       can we adjust the casing to align with the other static final fields in the class?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584662949



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted

Review comment:
       nit: typo `that the returned...`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584790632



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.
+   * @param shell The shell used for executing the query
+   * @param tableName The name of the table to query
+   * @param expected The records we inserted
+   */
+  public static void validateDataWithSql(TestHiveShell shell, String tableName, List<Record> expected) {
+    List<Object[]> actual = shell.executeStatement("SELECT * from " + tableName);
+
+    for (int rowId = 0; rowId < expected.size(); ++rowId) {
+      Record record = expected.get(rowId);
+      Object[] row = actual.get(rowId);
+      Assert.assertEquals(record.size(), row.length);
+      for (int fieldId = 0; fieldId < record.size(); ++fieldId) {
+        Types.NestedField field = record.struct().fields().get(fieldId);
+        String inserted =
+            getStringValueForInsert(record.getField(field.name()), field.type()).replaceAll("'(.*)'", "$1");

Review comment:
       Yeah, the comment makes it much clearer.
   Maybe a small formatting suggestion? Let me know what you think.
   ```
   String inserted = getStringValueForInsert(record.getField(field.name()), field.type())
       // If there are enclosing quotes then remove them
       .replaceAll("'(.*)'", "$1");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584663089



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -265,4 +278,57 @@ public static void validateFiles(Table table, Configuration conf, JobID jobId, i
     Assert.assertEquals(dataFileNum, dataFiles.size());
     Assert.assertFalse(new File(HiveIcebergOutputCommitter.generateJobLocation(conf, jobId)).exists());
   }
+
+  /**
+   * Lazy implementation for checking the the returned results from a SELECT statement are the same than the inserted
+   * values. We expect that the values are inserted using {@link #getStringValueForInsert(Object, Type)}.
+   * <p>
+   * For completeness shake we might want to implement sorting when needed.

Review comment:
       nit: typo `completeness sake`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584697687



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergTimestampWithZoneObjectInspector.java
##########
@@ -42,12 +42,13 @@ private IcebergTimestampWithZoneObjectInspector() {
 
   @Override
   public OffsetDateTime convert(Object o) {
-    return o == null ? null : OffsetDateTime.ofInstant(((Timestamp) o).toInstant(), ZoneOffset.UTC);
+    return o == null ? null : OffsetDateTime.of(((Timestamp) o).toLocalDateTime(), ZoneOffset.UTC);
   }
 
   @Override
   public Timestamp getPrimitiveJavaObject(Object o) {
-    return o == null ? null : Timestamp.from(((OffsetDateTime) o).toInstant());
+    return o == null ? null :

Review comment:
       Get it, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584734473



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java
##########
@@ -107,6 +113,13 @@
               PrimitiveObjectInspectorFactory.writableStringObjectInspector
           ));
 
+  public static final DateTimeFormatter timestampWithTZFormatter = new DateTimeFormatterBuilder()

Review comment:
       Fixed, also fixed in `TestHiveIcebergStorageHandlerTimezone`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#discussion_r584675503



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -323,14 +323,16 @@ public void testInsertSupportedTypes() throws IOException {
         continue;
       }
       String columnName = type.typeId().toString().toLowerCase() + "_column";
+      String tableName = type.typeId().toString().toLowerCase() + "_table_" + i;
 
       Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, columnName, type));
       List<Record> expected = TestHelper.generateRandomRecords(schema, 5, 0L);
 
-      Table table = testTables.createTable(shell, type.typeId().toString().toLowerCase() + "_table_" + i,
-          schema, PartitionSpec.unpartitioned(), fileFormat, expected);
+      Table table = testTables.createTable(shell, tableName, schema, PartitionSpec.unpartitioned(), fileFormat,
+          expected);
 
       HiveIcebergTestUtils.validateData(table, expected, 0);

Review comment:
       Now that we have a way of validating data via SQL as well, can we somehow reflect the difference that `validateData` queries that data via Iceberg API? Maybe through the method names or their respective javadocs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2278: Hive: Fix predicate pushdown for Timestamp.withZone()

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2278:
URL: https://github.com/apache/iceberg/pull/2278#issuecomment-788079096


   @shardulm94: Could you please review this PR? This is the TimestampTz part of our previous change (#2254)
   
   Thanks,
   Peter


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org