You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hcatalog-commits@incubator.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2011/06/20 05:13:48 UTC

[jira] [Created] (HCATALOG-49) Support Avro Data File Format in HCatalog

Support Avro Data File Format in HCatalog
-----------------------------------------

                 Key: HCATALOG-49
                 URL: https://issues.apache.org/jira/browse/HCATALOG-49
             Project: HCatalog
          Issue Type: New Feature
            Reporter: Tom White


Add input and output drivers for Avro.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HCATALOG-49) Support Avro Data File Format in HCatalog

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064180#comment-13064180 ] 

Jakob Homan commented on HCATALOG-49:
-------------------------------------

Hey Tom - I had planned on looking at what it would take to convert the haivvreo code to work with hcatalog.  I've not had a chance to go through your code.  Do you know if one is more feature-full than the other?

> Support Avro Data File Format in HCatalog
> -----------------------------------------
>
>                 Key: HCATALOG-49
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-49
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Tom White
>         Attachments: HCATALOG-49.patch
>
>
> Add input and output drivers for Avro.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HCATALOG-49) Support Avro Data File Format in HCatalog

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064168#comment-13064168 ] 

Thejas M Nair commented on HCATALOG-49:
---------------------------------------

Comments HCATALOG-49.patch 

- AvroInputStorageDriver.getTypedObj and AvroOutputStorageDriver.getTypedObj should be called recursively for map and list values. Complex type is supported in the schema validation done in TypeConverter .

- AvroInputStorageDriver.convertToHCatRecord converts char field names to lower case before looking up in avro schema, but TypeConverter.check doesn't. Does avro Schema.getField do case sensitive comparison of field names ?

- AvroOutputStorageDriver.convertValue - it will bit more efficient to loop on the position (for (int i=0; i < outputSchema., that way the column name does not have to be looked up in outputSchema (ie HCatRecord.get(i) can be used instead of HCatRecord.get(name, schema) )

- TestAvroInputStorageDriver - I think it will be useful to have some test cases for the case where only some of the fields are requested, and the case when some of the fields are partition keys.

(FYI, I am not a committer on HCatalog.)


> Support Avro Data File Format in HCatalog
> -----------------------------------------
>
>                 Key: HCATALOG-49
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-49
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Tom White
>         Attachments: HCATALOG-49.patch
>
>
> Add input and output drivers for Avro.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HCATALOG-49) Support Avro Data File Format in HCatalog

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HCATALOG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HCATALOG-49:
------------------------------

    Attachment: HCATALOG-49.patch

Here is an initial attempt to support Avro in HCatalog.

Some notes:

* For output, an Avro schema is computed for the HCatalog schema by the Avro output storage driver. The current patch does not allow you to specify a custom Avro schema - this would be a natural extension.
* Avro map keys must be strings, wheres they can be any type in HCatalog. The current implementation assumes that HCatalog maps have string types, and fails if this is not true. It might be possible to relax this restriction in the future by doing type conversion. 
* In HCatalog, values can be null, whereas this is not true for simple schemas in Avro. It would be possible to generate null unions in Avro, but this isn't done here. This could be a future enhancement.
* For the Avro input storage driver, the Avro schema in the Avro Data File is checked for compatibility with the HCatalog schema, and an exception is thrown if there's a mismatch.
* Byte arrays can not be represented in HCatalog, so there is no way to read byte arrays from Avro files. (Pig has the same limitation.)



> Support Avro Data File Format in HCatalog
> -----------------------------------------
>
>                 Key: HCATALOG-49
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-49
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Tom White
>         Attachments: HCATALOG-49.patch
>
>
> Add input and output drivers for Avro.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira