You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/09 11:02:19 UTC

[GitHub] [iceberg] marton-bod opened a new pull request #1570: Run hive storage handler tests for both Parquet and ORC

marton-bod opened a new pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570


   It would be great to increase our Hive test coverage for ORC as well. One way to make progress on that is to run the same StorageHandler tests for both file formats (Parquet and ORC). 
   
   This would be ideally done by using the JUnit `Parameterized` test runner, however, we're already using the `StandaloneHiveRunner` test runner, so can't use both on the same class. @massdosage are you aware of any easy ways to use HiveRunner without this test runner? Couldn't find any Rules-based implementation for it, and pulling out the logic of the HiveShell initialization from the Runner seemed complicated enough to make it not worth it. Similarly, I didn't find any Rules-based implementation for `Parameterized` either, therefore I had to resort to simply writing new test methods for ORC.
   
   It's not elegant, but given the constraints, this comes out to be the least-complicated approach still - and gives us the test coverage we need. I'm happy to use a different approach if you guys have any other ideas on how to make this work in a way that 'scales' better. Thanks!
   @rdblue @massdosage @pvary 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#discussion_r502830214



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerBaseTest.java
##########
@@ -136,18 +136,50 @@ public void after() throws Exception {
     }
   }
 
+  // PARQUET
+
+  @Test
+  public void testScanEmptyTableParquet() throws IOException {

Review comment:
       Rather than creating more test cases, we usually parameterize the test suite. [`TestIdentityPartitionData`](https://github.com/apache/iceberg/blob/425b10cea34eef11cca9cf0d237e02274f6dc958/spark/src/test/java/org/apache/iceberg/spark/source/TestIdentityPartitionData.java) is an example. Could you use that approach instead? Or is there a reason why you chose to do it this way?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#discussion_r502838320



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerBaseTest.java
##########
@@ -136,18 +136,50 @@ public void after() throws Exception {
     }
   }
 
+  // PARQUET
+
+  @Test
+  public void testScanEmptyTableParquet() throws IOException {

Review comment:
       Sorry, my mistake!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#issuecomment-706619327


   @marton-bod, is there a reason not to add similar test cases for Avro?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#discussion_r504574338



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerBaseTest.java
##########
@@ -136,18 +136,50 @@ public void after() throws Exception {
     }
   }
 
+  // PARQUET
+
+  @Test
+  public void testScanEmptyTableParquet() throws IOException {

Review comment:
       I'm afraid this isn't possible with HiveRunner using JUnit4 rules but it is possible using JUnit5 extensions as this doesn't require a different extension to be used, the @ParameterizedTest annotation can go at the method level and then the two play nicely together. So, if/when Iceberg moves to JUnit5 we could refactor all this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#discussion_r502838173



##########
File path: mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerBaseTest.java
##########
@@ -136,18 +136,50 @@ public void after() throws Exception {
     }
   }
 
+  // PARQUET
+
+  @Test
+  public void testScanEmptyTableParquet() throws IOException {

Review comment:
       Hi @rdblue, thanks for taking a look! The reason I did that is because these StorageHandler tests already use a different test runner (`StandaloneHiveRunner`), therefore I couldn't use `Parameterized.class` as well. I've tried to outline this rationale in the PR description, but I'm afraid it must have gotten lost in my rather long-winded sentences... :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570#issuecomment-707131212


   @rdblue makes sense to add Avro tests as well, I guess I was just focusing too much on ORC this time around :) https://github.com/apache/iceberg/pull/1585 
   thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1570: Run hive storage handler tests for both Parquet and ORC

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1570:
URL: https://github.com/apache/iceberg/pull/1570


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org