You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2012/07/17 20:47:34 UTC

[jira] [Created] (HIVE-3264) Add support for binary dataype to AvroSerde

Jakob Homan created HIVE-3264:
---------------------------------

             Summary: Add support for binary dataype to AvroSerde
                 Key: HIVE-3264
                 URL: https://issues.apache.org/jira/browse/HIVE-3264
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
            Reporter: Jakob Homan


When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461542#comment-13461542 ] 

Jakob Homan commented on HIVE-3264:
-----------------------------------

Actually, can we add to the .q file a describe on the table to verify that Hive sees the new type correctly/ Also, there should be an equivalent unit test added to TestAvroDeserializer.  Also, does this support serializing Hive binary to bytes? Sorry for this falling of my radar...
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

    Attachment: HIVE-3264-1.patch

This one bears a bit of scrutiny, first attempt here...
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

    Attachment: HIVE-3264-3.patch

Should work at this point, adds integration test etc.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461541#comment-13461541 ] 

Ashutosh Chauhan commented on HIVE-3264:
----------------------------------------

+1 will commit if tests pass.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

                 Tags: serde
               Labels: patch  (was: )
    Affects Version/s: 0.9.0
               Status: Patch Available  (was: Open)

First swipe at updating Avro-Hive binary datatype conversions.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-3264:
-----------------------------------

    Status: Open  (was: Patch Available)

Marking as open, since looks like it needs some more work.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499866#comment-13499866 ] 

Eli Reisman commented on HIVE-3264:
-----------------------------------

This had fallen off my radar too, sorry. What needs to be done/added? When I was originally working on this, I was told the .q file approach was the test we needed. What sort of test should I add?

                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

    Attachment: HIVE-3264-4.patch

Managed to miss the inclusion of one of the test files, should be a complete patch now.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

    Attachment: HIVE-3264-5.patch

Added a .q.out file to the party, still figuring out the build pricess here. I'm guessing if I say this is a complete patch at this point it will represent some form of jinx, but I'm going to go ahead and say it.

                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426909#comment-13426909 ] 

Eli Reisman commented on HIVE-3264:
-----------------------------------

Looking at the source, I'm seeing AvroSerDe code mapping BYTE to Constants.TINYINT_TYPE_NAME in Hive, but binary type (array/blob) already maps to Constants.BINARY_TYPE_NAME -- and inside the Hive constants, I see no "byte" type to replace TINYINT_TYPE_NAME, is there something I'm missing? I could not tell from the description whether the binary type is what you were discussing (looks like its already in place), or the single-byte type in specific. In that case, there is no replacement I can find for  tiny int on the Hive side. I didn't see anything on the binary array being implemented as tiny ints any more in the code. I am looking in the org.apache.hive.serde2.* packages.

                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Jakob Homan
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461605#comment-13461605 ] 

Ashutosh Chauhan commented on HIVE-3264:
----------------------------------------

I see that describe is already there in .q file and its printing the expected output as well. Adding more unit tests is always welcome. Looks like it will serialize binary to bytes. I see the code there. But there are no tests, (.q is just testing desirialization) so not sure. It will be good to add a test for it.
                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated HIVE-3264:
------------------------------

    Attachment: HIVE-3264-2.patch

makes sure AvroDeserializer unwraps byte[] from ByteBuffer before returning deserialized byte[].

Still wondering why the tests did catch this the first time, looking at them, possibly a subsequent patch/JIRA on the way on that one...

                
> Add support for binary dataype to AvroSerde
> -------------------------------------------
>
>                 Key: HIVE-3264
>                 URL: https://issues.apache.org/jira/browse/HIVE-3264
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Jakob Homan
>              Labels: patch
>         Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints.  Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira