You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:07:24 UTC

[GitHub] [hive] kuczoram opened a new pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

kuczoram opened a new pull request #2407:
URL: https://github.com/apache/hive/pull/2407


   …ge on Iceberg table
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656848564



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Actually in the result set a float column is returned as double and a decimal is returned as string, even though Hive has the columns with the right types. When I used a schema with float and decimal, I got exception when parsing the result. I guess this conversation happens when fetching the result set after calling the select through the shell. I didn't dig deeper but I can do it if you want.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655344122



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Shouldn't this be `Types.FloatType.get()` and `Types.DecimalType.of(2, 1)`? Or does the result set given back by Hive makes it difficult to use those types?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655304838



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =

Review comment:
       Can we move this closer to where it's first used?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)

Review comment:
       nit: can we merge these two declarations by not calling the `.build()` method separately (and elsewhere where it's the same pattern)? No strong feelings, so we can keep as is, but at least in my opinion it would make it a bit more streamlined

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();

Review comment:
       Oh okay, I see now why you kept the builder :)

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();

Review comment:
       You can't  add a required column without first calling `allowIncompatibleChanges()`? (because adding a required column is always backwards-incompatible?)

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       This is not actually filled with initial data in this scenario. Do we want to add initial data? What would happen if we read the data back after adding the required column, would it be NULLs or errors?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       This is not actually filled with initial data in this scenario. Do we want to add initial data? What would happen if we read the data back after adding the required column, would the old records have nulls for the `age` column, or we'd get a read-time error?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       the comment is not valid here anymore

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));

Review comment:
       Shouldn't this be `family_name`?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Shouldn't this be `Types.FloatType.get()` and `Types.DecimalType.of(2, 1)`?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Shouldn't this be `Types.FloatType.get()` and `Types.DecimalType.of(2, 1)`? Or does the result set given back by Hive makes it difficult to use those types?

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));
+
+    List<Record> expectedResults = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22d, "1.3")
+        .add(1L, 223344, 555.22d, "2.2").add(2L, -234, -342d, "-1.2").build();
+
+    // Check the select resut and the column types from Hive
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM types_table");
+    HiveIcebergTestUtils.validateData(expectedResults, HiveIcebergTestUtils.valueForRow(schemaForResultSet, rows), 0);
+
+    rows = shell.executeStatement("DESCRIBE types_table");
+    Assert.assertEquals("id", rows.get(0)[0]);

Review comment:
       you might want to use `shell.metastore().getTable()` to load the HMS table and then access its columns directly `table.getSd().getCols()`. But it's up to you. Just generally, we've avoided parsing the describe output so far




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656834319



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();

Review comment:
       Exactly. It is declared as an incompatible change that can break reading old data. The addRequiredColumn method will result in an exception unless the allowIncompatibleChanges method has been called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656841849



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =

Review comment:
       Sure, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655345983



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));
+
+    List<Record> expectedResults = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22d, "1.3")
+        .add(1L, 223344, 555.22d, "2.2").add(2L, -234, -342d, "-1.2").build();
+
+    // Check the select resut and the column types from Hive
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM types_table");
+    HiveIcebergTestUtils.validateData(expectedResults, HiveIcebergTestUtils.valueForRow(schemaForResultSet, rows), 0);
+
+    rows = shell.executeStatement("DESCRIBE types_table");
+    Assert.assertEquals("id", rows.get(0)[0]);

Review comment:
       you might want to use `shell.metastore().getTable()` to load the HMS table and then access its columns directly `table.getSd().getCols()`. But it's up to you. Just generally, we've avoided parsing the describe output so far




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram merged pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram merged pull request #2407:
URL: https://github.com/apache/hive/pull/2407


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655304838



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =

Review comment:
       Can we move this closer to where it's first used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656887256



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();

Review comment:
       No need, this approach works fine I think :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655307485



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();

Review comment:
       Oh okay, I see now why you kept the builder :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2407:
URL: https://github.com/apache/hive/pull/2407#issuecomment-865006033


   Thanks @kuczoram , looks great! Just a few questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655312262



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       This is not actually filled with initial data in this scenario. Do we want to add initial data? What would happen if we read the data back after adding the required column, would the old records have nulls for the `age` column, or we'd get a read-time error?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656852627



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));
+
+    List<Record> expectedResults = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22d, "1.3")
+        .add(1L, 223344, 555.22d, "2.2").add(2L, -234, -342d, "-1.2").build();
+
+    // Check the select resut and the column types from Hive
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM types_table");
+    HiveIcebergTestUtils.validateData(expectedResults, HiveIcebergTestUtils.valueForRow(schemaForResultSet, rows), 0);
+
+    rows = shell.executeStatement("DESCRIBE types_table");
+    Assert.assertEquals("id", rows.get(0)[0]);

Review comment:
       Oh, ok, thanks for the hint. I will change that check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655310601



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();

Review comment:
       You can't  add a required column without first calling `allowIncompatibleChanges()`? (because adding a required column is always backwards-incompatible?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655305917



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)

Review comment:
       nit: can we merge these two declarations by not calling the `.build()` method separately (and elsewhere where it's the same pattern)? No strong feelings, so we can keep as is, but at least in my opinion it would make it a bit more streamlined




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656888363



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       Ah I see. Let's leave it as is then, but maybe a similar explanatory comment here like in other test case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656844296



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));

Review comment:
       Yeah, it is a typo. Thanks for finding it. Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r665323311



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Added a comment about that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655317847



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       the comment is not valid here anymore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r665314456



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       Sure, I added some comment about that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655335482



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));

Review comment:
       Shouldn't this be `family_name`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656848564



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Actually in the result set a float column is returned as double and a decimal is returned as string, even though Hive has the columns with the right types. I guess this conversation happen when fetching the result set after calling the select through the shell. I didn't dig deeper but I can do it if you want.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2407:
URL: https://github.com/apache/hive/pull/2407#issuecomment-865006033


   Thanks @kuczoram , looks great! Just a few questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655344122



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Shouldn't this be `Types.FloatType.get()` and `Types.DecimalType.of(2, 1)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656839312



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       I wanted to add initial data. The problem is that since adding a required column is an incompatible change, there is no contract on what happens when trying to read the old data back. I tried and it behaves differently depending on the underlaying file format. 
   - For AVRO it fails with 
   `Caused by: java.lang.IllegalArgumentException: Missing required field: age at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)`
   - For ORC it fails with
   ```
   java.lang.IllegalArgumentException: Field 4 of type long is required and was not found.
   	 at org.apache.iceberg.orc.ORCSchemaUtil.buildOrcProjection(ORCSchemaUtil.java:310)
   ```
   - For PARQUET no error happens, if the required field is empty, NULL will return in the resultset.
   
   So this reading is not unified and I don't think it makes sense to write a test for it as there is no expected behaviour.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656843384



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       Oh, thanks, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656887484



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();

Review comment:
       Cool, sounds good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656889924



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, null);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().addRequiredColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        required(4, "age", Types.LongType.get()));
+
+    // Insert some data with age column from Hive.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (0L, 'Lily', 'Magenta', 28L), (1L, 'Roni', 'Purple', 33L)");
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    List<Record> customersWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Lily", "Magenta", 28L).add(1L, "Roni", "Purple", 33L).build();
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testAddColumnIntoStructToIcebergTable() throws IOException {
+    Schema schema = new Schema(required(1, "id", Types.LongType.get()), required(2, "person", Types.StructType
+        .of(required(3, "first_name", Types.StringType.get()), required(4, "last_name", Types.StringType.get()))));
+    List<Record> people = TestHelper.generateRandomRecords(schema, 3, 0L);
+
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "people", schema, fileFormat, people);
+    // Add a new column (age long) to the Iceberg table into the person struct
+    icebergTable.updateSchema().addColumn("person", "age", Types.LongType.get()).commit();
+
+    Schema schemaWithAge = new Schema(required(1, "id", Types.LongType.get()),
+        required(2, "person", Types.StructType.of(required(3, "first_name", Types.StringType.get()),
+            required(4, "last_name", Types.StringType.get()), optional(5, "age", Types.LongType.get()))));
+    List<Record> newPeople = TestHelper.generateRandomRecords(schemaWithAge, 2, 10L);
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "people"));
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newPeople);
+
+    List<Record> sortedExpected = new ArrayList<>(people);
+    sortedExpected.addAll(newPeople);
+    sortedExpected.sort(Comparator.comparingLong(record -> (Long) record.get(0)));
+    List<Object[]> rows = shell
+        .executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people order by id");
+    Assert.assertEquals(sortedExpected.size(), rows.size());
+    for (int i = 0; i < sortedExpected.size(); i++) {
+      Object[] row = rows.get(i);
+      Long id = (Long) sortedExpected.get(i).get(0);
+      Record person = (Record) sortedExpected.get(i).getField("person");
+      String lastName = (String) person.getField("last_name");
+      String firstName = (String) person.getField("first_name");
+      Long age = null;
+      if (person.getField("age") != null) {
+        age = (Long) person.getField("age");
+      }
+      Assert.assertEquals(id, (Long) row[0]);
+      Assert.assertEquals(firstName, (String) row[1]);
+      Assert.assertEquals(lastName, (String) row[2]);
+      Assert.assertEquals(age, row[3]);
+    }
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement("CREATE TABLE dummy_tbl (id bigint, first_name string, last_name string, age bigint)");
+    shell.executeStatement("INSERT INTO dummy_tbl VALUES (1, 'Lily', 'Blue', 34), (2, 'Roni', 'Grey', NULL)");
+    shell.executeStatement("INSERT INTO default.people SELECT id, named_struct('first_name', first_name, " +
+        "'last_name', last_name, 'age', age) from dummy_tbl");
+
+    rows = shell.executeStatement("SELECT id, person.first_name, person.last_name, person.age FROM default.people " +
+        "where id in (1, 2) order by id");
+    Assert.assertEquals(2, rows.size());
+    Assert.assertEquals((Long) 1L, (Long) rows.get(0)[0]);
+    Assert.assertEquals("Lily", (String) rows.get(0)[1]);
+    Assert.assertEquals("Blue", (String) rows.get(0)[2]);
+    Assert.assertEquals((Long) 34L, (Long) rows.get(0)[3]);
+    Assert.assertEquals((Long) 2L, (Long) rows.get(1)[0]);
+    Assert.assertEquals("Roni", (String) rows.get(1)[1]);
+    Assert.assertEquals("Grey", (String) rows.get(1)[2]);
+    Assert.assertNull(rows.get(1)[3]);
+  }
+
+  @Test
+  public void testMakeColumnRequiredInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new required column (age long) to the Iceberg table.
+    icebergTable.updateSchema().allowIncompatibleChanges().requireColumn("last_name").commit();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert some data with last_name no NULL.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', 'Purple')");
+
+    List<Record> customerRecords = TestHelper.RecordsBuilder
+        .newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", "Purple").build();
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customerRecords,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Should add test step to insert NULL value into the new required column. But at the moment it
+    // works inconsistently for different file types, so leave it for later when this behaviour is cleaned up.
+  }
+
+  @Test
+  public void testRemoveColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column from the table.
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+
+    Schema customerSchemaWithoutFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithoutFirstNameBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithoutFirstName).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive to see if the result doesn't contain the first_name column any more.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+
+    // Run a 'select first_name' and check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'first_name'", () -> {
+          shell.executeStatement("SELECT first_name FROM default.customers");
+        });
+
+    // Insert an entry from Hive to check if it can be inserted without the first_name column.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta')");
+
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    customersWithoutFirstNameBuilder.add(4L, "Magenta");
+    customersWithoutFirstName = customersWithoutFirstNameBuilder.build();
+    HiveIcebergTestUtils.validateData(customersWithoutFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithoutFirstName, rows), 0);
+  }
+
+  @Test
+  public void testRemoveAndAddBackColumnFromIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Remove the first_name column
+    icebergTable.updateSchema().deleteColumn("first_name").commit();
+    // Add a new column with the name first_name
+    icebergTable.updateSchema().addColumn("first_name", Types.StringType.get(), "This is new first name").commit();
+
+    // Add new data to the table with the new first_name column filled.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    Schema customerSchemaWithNewFirstName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "last_name", Types.StringType.get(), "This is last name"),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+    List<Record> newCustomersWithNewFirstName =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(3L, "Red", "James").build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomersWithNewFirstName);
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaWithNewFirstName).add(0L, "Brown", null)
+            .add(1L, "Green", null).add(2L, "Pink", null).add(3L, "Red", "James");
+    List<Record> customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+
+    // Run a 'select *' from Hive and check if the first_name column is returned.
+    // It should be null for the old data and should be filled in the entry added after the column addition.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    Schema customerSchemaWithNewFirstNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(3, "first_name", Types.StringType.get(), "This is the newly added first name"));
+
+    TestHelper.RecordsBuilder customersWithNewFirstNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithNewFirstNameOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, "James");
+    List<Record> customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+
+    // Run a 'select first_name' from Hive to check if the new first-name column can be queried.
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+
+    // Insert data from Hive with first_name filled and with null first_name value.
+    shell.executeStatement("INSERT INTO default.customers values (4L, 'Magenta', 'Lily'), (5L, 'Purple', NULL)");
+
+    // Check if the newly inserted data is returned correctly by select statements.
+    customersWithNewFirstNameBuilder.add(4L, "Magenta", "Lily").add(5L, "Purple", null);
+    customersWithNewFirstName = customersWithNewFirstNameBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstName,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstName, rows), 0);
+
+    customersWithNewFirstNameOnlyBuilder.add(4L, "Lily").add(5L, null);
+    customersWithNewFirstNameOnly = customersWithNewFirstNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, first_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithNewFirstNameOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithNewFirstNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testRenameColumnInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Rename the last_name column to family_name
+    icebergTable.updateSchema().renameColumn("last_name", "family_name").commit();
+
+    Schema schemaWithFamilyName = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "family_name", Types.StringType.get(), "This is last name"));
+
+    // Run a 'select *' from Hive to check if the same records are returned in the same order as before the rename.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    Schema shemaWithFamilyNameOnly = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"));
+    TestHelper.RecordsBuilder customersWithFamilyNameOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(shemaWithFamilyNameOnly).add(0L, "Brown").add(1L, "Green").add(2L, "Pink");
+    List<Record> customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+
+    // Run a 'select family_name' from Hive to check if the column can be queried with the new name.
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+
+    // Run a 'select last_name' to check if an exception is thrown.
+    AssertHelpers.assertThrows("should throw exception", IllegalArgumentException.class,
+        "Invalid table alias or column reference 'last_name'", () -> {
+          shell.executeStatement("SELECT last_name FROM default.customers");
+        });
+
+    // Insert some data from Hive to check if the last_name column is still can be filled.
+    shell.executeStatement("INSERT INTO default.customers values (3L, 'Lily', 'Magenta'), (4L, 'Roni', NULL)");
+
+    List<Record> newCustomers = TestHelper.RecordsBuilder.newInstance(schemaWithFamilyName).add(0L, "Alice", "Brown")
+        .add(1L, "Bob", "Green").add(2L, "Trudy", "Pink").add(3L, "Lily", "Magenta").add(4L, "Roni", null).build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(newCustomers, HiveIcebergTestUtils.valueForRow(schemaWithFamilyName, rows), 0);
+
+    customersWithFamilyNameOnlyBuilder.add(3L, "Magenta").add(4L, null);
+    customersWithFamilyNameOnly = customersWithFamilyNameOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, family_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithFamilyNameOnly,
+        HiveIcebergTestUtils.valueForRow(shemaWithFamilyNameOnly, rows), 0);
+  }
+
+  @Test
+  public void testMoveLastNameToFirstInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column in the table schema as first_column
+    icebergTable.updateSchema().moveFirst("last_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be first in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveLastNameBeforeCustomerIdInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveBefore("last_name", "customer_id").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "last_name", Types.StringType.get(), "This is last name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "first_name", Types.StringType.get(), "This is first name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Brown", 0L, "Alice")
+            .add("Green", 1L, "Bob").add("Pink", 2L, "Trudy");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Magenta', 3L, 'Lily')");
+
+    customersWithLastNameFirstBuilder.add("Magenta", 3L, "Lily");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testMoveCustomerIdAfterFirstNameInIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Move the last_name column before the customer_id in the table schema.
+    icebergTable.updateSchema().moveAfter("customer_id", "first_name").commit();
+
+    Schema customerSchemaLastNameFirst =
+        new Schema(optional(1, "first_name", Types.StringType.get(), "This is first name"),
+            optional(2, "customer_id", Types.LongType.get()),
+            optional(3, "last_name", Types.StringType.get(), "This is last name"));
+
+    TestHelper.RecordsBuilder customersWithLastNameFirstBuilder =
+        TestHelper.RecordsBuilder.newInstance(customerSchemaLastNameFirst).add("Alice", 0L, "Brown")
+            .add("Bob", 1L, "Green").add("Trudy", 2L, "Pink");
+    List<Record> customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+
+    // Run a 'select *' to check if the order of the column in the result has been changed.
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+
+    // Query the data with names and check if the result is the same as when the table was created.
+    rows = shell.executeStatement("SELECT customer_id, first_name, last_name FROM default.customers");
+    HiveIcebergTestUtils.validateData(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+        HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, rows), 0);
+
+    // Insert data from Hive to check if the last_name column has to be before the customer_id in the values list.
+    shell.executeStatement("INSERT INTO default.customers values ('Lily', 3L, 'Magenta')");
+
+    customersWithLastNameFirstBuilder.add("Lily", 3L, "Magenta");
+    customersWithLastNameFirst = customersWithLastNameFirstBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithLastNameFirst,
+        HiveIcebergTestUtils.valueForRow(customerSchemaLastNameFirst, rows), 1);
+  }
+
+  @Test
+  public void testUpdateColumnTypeInIcebergTable() throws IOException {
+    // Create an Iceberg table with int, float and decimal(2,1) types with some initial records
+    Schema schema = new Schema(optional(1, "id", Types.LongType.get()),
+        optional(2, "int_col", Types.IntegerType.get(), "This is an integer type"),
+        optional(3, "float_col", Types.FloatType.get(), "This is a float type"),
+        optional(4, "decimal_col", Types.DecimalType.of(2, 1), "This is a decimal type"));
+
+    List<Record> records = TestHelper.RecordsBuilder.newInstance(schema).add(0L, 35, 22F, BigDecimal.valueOf(13L, 1))
+        .add(1L, 223344, 555.22F, BigDecimal.valueOf(22L, 1)).add(2L, -234, -342F, BigDecimal.valueOf(-12L, 1)).build();
+
+    Table icebergTable = testTables.createTable(shell, "types_table", schema, fileFormat, records);
+
+    Schema schemaForResultSet =
+        new Schema(optional(1, "id", Types.LongType.get()), optional(2, "int_col", Types.IntegerType.get()),
+            optional(3, "float_col", Types.DoubleType.get()), optional(4, "decimal_col", Types.StringType.get()));

Review comment:
       Ah I see. It's okay then. Maybe just add a comment, explaining that we have to use these types here because of the fetch task type conversion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kuczoram commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
kuczoram commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r656851967



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();

Review comment:
       :) Is it ok like this or do you prefer using different builders to add more records?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2407: HIVE-25264: Add tests to verify Hive can read/write after schema chan…

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2407:
URL: https://github.com/apache/hive/pull/2407#discussion_r655312262



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -1273,6 +1274,510 @@ public void testScanTableCaseInsensitive() throws IOException {
     Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testAddColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.
+    Table icebergTable = testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+        fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+
+    // Add a new column (age long) to the Iceberg table.
+    icebergTable.updateSchema().addColumn("age", Types.LongType.get()).commit();
+
+    Schema customerSchemaWithAge = new Schema(optional(1, "customer_id", Types.LongType.get()),
+        optional(2, "first_name", Types.StringType.get(), "This is first name"),
+        optional(3, "last_name", Types.StringType.get(), "This is last name"),
+        optional(4, "age", Types.LongType.get()));
+
+    Schema customerSchemaWithAgeOnly =
+        new Schema(optional(1, "customer_id", Types.LongType.get()), optional(4, "age", Types.LongType.get()));
+
+    // Also add a new entry to the table where the age column is set.
+    icebergTable = testTables.loadTable(TableIdentifier.of("default", "customers"));
+    List<Record> newCustomerWithAge = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null).build();
+    testTables.appendIcebergTable(shell.getHiveConf(), icebergTable, fileFormat, null, newCustomerWithAge);
+
+    // Do a 'select *' from Hive and check if the age column appears in the result.
+    // It should be null for the old data and should be filled for the data added after the column addition.
+    TestHelper.RecordsBuilder customersWithAgeBuilder = TestHelper.RecordsBuilder.newInstance(customerSchemaWithAge)
+        .add(0L, "Alice", "Brown", null).add(1L, "Bob", "Green", null).add(2L, "Trudy", "Pink", null)
+        .add(3L, "James", "Red", 34L).add(4L, "Lily", "Blue", null);
+    List<Record> customersWithAge = customersWithAgeBuilder.build();
+
+    List<Object[]> rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    // Do a 'select customer_id, age' from Hive to check if the new column can be queried from Hive.
+    // The customer_id is needed because of the result sorting.
+    TestHelper.RecordsBuilder customerWithAgeOnlyBuilder = TestHelper.RecordsBuilder
+        .newInstance(customerSchemaWithAgeOnly).add(0L, null).add(1L, null).add(2L, null).add(3L, 34L).add(4L, null);
+    List<Record> customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+
+    // Insert some data with age column from Hive. Insert an entry with null age and an entry with filled age.
+    shell.executeStatement(
+        "INSERT INTO default.customers values (5L, 'Lily', 'Magenta', NULL), (6L, 'Roni', 'Purple', 23L)");
+
+    customersWithAgeBuilder.add(5L, "Lily", "Magenta", null).add(6L, "Roni", "Purple", 23L);
+    customersWithAge = customersWithAgeBuilder.build();
+    rows = shell.executeStatement("SELECT * FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAge, HiveIcebergTestUtils.valueForRow(customerSchemaWithAge, rows),
+        0);
+
+    customerWithAgeOnlyBuilder.add(5L, null).add(6L, 23L);
+    customersWithAgeOnly = customerWithAgeOnlyBuilder.build();
+    rows = shell.executeStatement("SELECT customer_id, age FROM default.customers");
+    HiveIcebergTestUtils.validateData(customersWithAgeOnly,
+        HiveIcebergTestUtils.valueForRow(customerSchemaWithAgeOnly, rows), 0);
+  }
+
+  @Test
+  public void testAddRequiredColumnToIcebergTable() throws IOException {
+    // Create an Iceberg table with the columns customer_id, first_name and last_name with some initial data.

Review comment:
       This is not actually filled with initial data in this scenario. Do we want to add initial data? What would happen if we read the data back after adding the required column, would it be NULLs or errors?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org