You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/06/19 23:42:43 UTC

[jira] [Created] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Namit Jain created HIVE-3163:
--------------------------------

             Summary: enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
                 Key: HIVE-3163
                 URL: https://issues.apache.org/jira/browse/HIVE-3163
             Project: Hive
          Issue Type: Bug
            Reporter: Namit Jain


There are some serdes: ThriftDeserializer and AvroDeserilizer which 
contain the column information. I mean, the user does not need to specify the
column schema.

Currently, this information is hard-coded. SerDeUtils has a method to get this
information from the name of the serde.

Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397245#comment-13397245 ] 

Edward Capriolo commented on HIVE-3163:
---------------------------------------

You know I am not sure exactly why/when this is true. I have a serde not hard coded in serde-utils that does not require columns to be defined. But yes we should get rid of the hard codes. (the class mentions something about lazy-loading as well in the comments)
                
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414627#comment-13414627 ] 

Edward Capriolo commented on HIVE-3163:
---------------------------------------

It's not the case that this static list is the only way for a table to be columnist the avro deserializer does not add itself to the list and still can do this. I just created a protobuf serde that does this as well and is not in the list. Still jot exactly sure why anything needs to be listed here.
                
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445929#comment-13445929 ] 

Edward Capriolo commented on HIVE-3163:
---------------------------------------

I member looking at elephant bird aupport. Iirc it required the user to generate an addition class . I took this approach because I wanted hive to work with the protobuf class directly. I also figured that somehow the reflection inspector would have something that did not work the way I wanted so coding the entire piece from scratch was very attractive to me. If your data is in a different format you can likely reuse almost all of the code, you just have to customize the input format.
                
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-3163:
---------------------------------

    Component/s: Serializers/Deserializers
    
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Feng Peng (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444555#comment-13444555 ] 

Feng Peng commented on HIVE-3163:
---------------------------------

Cool. I guess our use case is kind of different from yours. We already have a ProtobufDeserializer in Elephantbird, which is currently using the ReflectionObjectInspector and thus getting a different set of fields from the original protobuf def. Since Elephantbird already extracts the field names from the protobuf objects, for the time being we will just write an OI on top of it, and replace it when the Hive ProtbobufDeserializer is finalized.

Thanks,
Feng
                
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3163) enhance Deserialize to add a method to show whether the column names are derived from the Deserializer

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444391#comment-13444391 ] 

Edward Capriolo commented on HIVE-3163:
---------------------------------------

@Feng I started out looking at how the thrift deserializer worked but I was really unable to get a handle for the changes that would be required. The reason we created the Pair class is it is very annoying when hive drops the key, as people tend to write data there for sorting. We can easily factor out the reflection pieces into common base classes.
                
> enhance Deserialize to add a method to show whether the column names are derived from the Deserializer 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3163
>                 URL: https://issues.apache.org/jira/browse/HIVE-3163
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>
> There are some serdes: ThriftDeserializer and AvroDeserilizer which 
> contain the column information. I mean, the user does not need to specify the
> column schema.
> Currently, this information is hard-coded. SerDeUtils has a method to get this
> information from the name of the serde.
> Ideally, Deserializer should be extended to add this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira