You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "David Chen (JIRA)" <ji...@apache.org> on 2014/07/10 01:22:04 UTC

[jira] [Commented] (HIVE-7286) Parameterize HCatMapReduceTest for testing against all Hive storage formats

    [ https://issues.apache.org/jira/browse/HIVE-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056894#comment-14056894 ] 

David Chen commented on HIVE-7286:
----------------------------------

Hi [~szehon], thanks for taking the time to review this patch and for your feedback and advice.

I have made some progress finishing HIVE-5976 and fixing the remaining test failures. However, as I am working on that patch, I realized that it only covers the current set of "native" SerDes, i.e. Sequence File, text, Parquet, ORC, and RCFile but not Avro and any of the other SerDes found throughout the Hive codebase. However, I do not think that this test should be limited to only covering those storage formats or only the ones in SERDESUSINGMETASTOREFORSCHEMA. They should cover all SerDes in the Hive codebase, especially since it is very likely that the other SerDes are actually being used; we use Avro almost exclusively here at LinkedIn.

After further thought, Avro is a particular special case because it requires an Avro schema to be set in the SerDe or table properties, and as a result, the test code must provide the TypeInfo to Avro Schema converter. This is a requirement that other SerDes do not have. At the same time, the TypeInfo to Avro Schema converter has good test coverage and will become useful when we make the AvroSerDe a native Hive storage format and remove the requirement for specifying an Avro schema, which should definitely be done in the future.

SerDe devs would only be required to add an entry to the table in the test with the SerDe class and nulls in the other fields. This would indicate that HCatalog is not being tested against the new storage format.

I am currently blocked on HIVE-5976 because there seems to be some issues with the pre-commit tests; even so, I think I will need to spend some more time to finish that patch. After further thought, after HIVE-5976 is committed, I think we will still want to keep most of the code in this patch and just modify the test to make exceptions using the enumeration of StorageFormatDescriptor in place of the TestStorageFormat classes (which is nearly identical to StorageFormatDescriptor).

Since this patch is ready and expands the coverage of the HCatMapReduceTest tests to run against RCFile, ORC, and SequenceFile and that HIVE-5976 will take more time to complete, I think we should go ahead and commit this patch and open a new ticket to make the necessary changes to these tests once HIVE-05976 is done. I am also working on adding a similar fixture to the HCatalog Pig Adapter tests, which also requires this patch.

> Parameterize HCatMapReduceTest for testing against all Hive storage formats
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-7286
>                 URL: https://issues.apache.org/jira/browse/HIVE-7286
>             Project: Hive
>          Issue Type: Test
>          Components: HCatalog
>            Reporter: David Chen
>            Assignee: David Chen
>         Attachments: HIVE-7286.1.patch
>
>
> Currently, HCatMapReduceTest, which is extended by the following test suites:
>  * TestHCatDynamicPartitioned
>  * TestHCatNonPartitioned
>  * TestHCatPartitioned
>  * TestHCatExternalDynamicPartitioned
>  * TestHCatExternalNonPartitioned
>  * TestHCatExternalPartitioned
>  * TestHCatMutableDynamicPartitioned
>  * TestHCatMutableNonPartitioned
>  * TestHCatMutablePartitioned
> These tests run against RCFile. Currently, only TestHCatDynamicPartitioned is run against any other storage format (ORC).
> Ideally, HCatalog should be tested against all storage formats supported by Hive. The easiest way to accomplish this is to turn HCatMapReduceTest into a parameterized test fixture that enumerates all Hive storage formats. Until HIVE-5976 is implemented, we would need to manually create the mapping of SerDe to InputFormat and OutputFormat. This way, we can explicitly keep track of which storage formats currently work with HCatalog or which ones are untested or have test failures. The test fixture should also use Reflection to find all classes in the classpath that implements the SerDe interface and raise a failure if any of them are not enumerated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)