You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/11/04 20:43:11 UTC

[GitHub] [hive] pvary opened a new pull request #2765: HIVE-25673: Column pruning fix for MR tasks

pvary opened a new pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   ### What changes were proposed in this pull request?
   When updating column pruning information `READ_NESTED_COLUMN_PATH_CONF_STR`, update `READ_COLUMN_NAMES_CONF_STR` and `READ_COLUMN_IDS_CONF_STR` as well. 
   
   ### Why are the changes needed?
   Iceberg MR queries are failing if multiple tables are queried and several columns are pruned
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744563434



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestTables.java
##########
@@ -437,7 +437,7 @@ public String identifier(String tableIdentifier) {
       }
 
       Assert.assertTrue(location.delete());
-      return location.toString();
+      return "file://" + location;

Review comment:
       If the table location does not match the file locations then we had a problem matching the `alias` to the `path`, and failed to add the configuration values to the HiveConf.
   
   We need to use the full path in the table location to match the file names.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744503927



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##########
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+    shell.setHiveSessionValue("hive.cbo.enable", true);
+
+    Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+    List<Record> records1 = TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+    testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+    Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), optional(2, "val", Types.StringType.get()));
+    List<Record> records2 = TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+    testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+    // MR is needed for the reproduction
+    shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
       How's the test coverage for this on Tez? Is it already covered, or do we need to run this for Tez as well to avoid future regressions?
   
   And a minor nit: we're setting the engine to MR but the test output will still show the parameter engine=Tez, which is not a big deal just hurts my OCD a bit :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744560669



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##########
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+    shell.setHiveSessionValue("hive.cbo.enable", true);
+
+    Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+    List<Record> records1 = TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+    testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+    Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), optional(2, "val", Types.StringType.get()));
+    List<Record> records2 = TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+    testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+    // MR is needed for the reproduction
+    shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
       With Hive 4.0.0 we do not support Iceberg. When I tried to run the tests with MR, the inserts were not working. So I had to run the inserts with Tez, and then the test query with MR.
   
   OTOH this is a valid issue with MR, and older versions on Hive, where MR is supported. (Maybe on newer version as well)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744568098



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##########
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+    shell.setHiveSessionValue("hive.cbo.enable", true);
+
+    Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+    List<Record> records1 = TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+    testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+    Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), optional(2, "val", Types.StringType.get()));
+    List<Record> records2 = TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+    testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+    // MR is needed for the reproduction
+    shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
       I am not sure that Hive removed MR support (I think not yet, but there is no active development maintaining it), but Hive - Iceberg definitely needs Tez.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744505963



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestTables.java
##########
@@ -437,7 +437,7 @@ public String identifier(String tableIdentifier) {
       }
 
       Assert.assertTrue(location.delete());
-      return location.toString();
+      return "file://" + location;

Review comment:
       Why was this needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2765: HIVE-25673: Column pruning fix for MR tasks

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744563870



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##########
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+    shell.setHiveSessionValue("hive.cbo.enable", true);
+
+    Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+    List<Record> records1 = TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+    testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+    Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), optional(2, "val", Types.StringType.get()));
+    List<Record> records2 = TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+    testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+    // MR is needed for the reproduction
+    shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
       Got it. By the way, did you mean "With Hive 4.0.0 we do not support MR"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org