You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/06/23 07:32:52 UTC

[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656834319



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();

Review comment:
       Exactly. It is declared as an incompatible change that can break reading old data. The addRequiredColumn method will result in an exception unless the allowIncompatibleChanges method has been called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org