You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/12/06 20:03:09 UTC

[jira] [Comment Edited] (HBASE-7233) Serializing KeyValues

    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511611#comment-13511611 ] 

Lars Hofhansl edited comment on HBASE-7233 at 12/6/12 7:02 PM:
---------------------------------------------------------------

{quote}So, set a pb header and then write out <length><bytearray> as we have now after we send the pb.{quote}That's what I was thinking, except now we send the Cells through an official DataBlockEncoder to generate the <bytearray> rather than using the custom KeyValue serializer in use right now.  We can make a new DataBlockEncoder that mimics the byte[] output of the current RPC format so it has roughly the same performance as the current client.

{quote}It won't be evolvable, right?  Unless we put a 'version' in the pb header or client{quote}We could put a version in the PB header. Probably safe to put a version in the header even if it never gets used.  I also have a version in the internal PrefixTree encoder, but an extra version byte here or there doesn't hurt anything.

{quote}It'd write <length><bytearray><length><bytearray> and the byte array would be the backing array of a KV?{quote}Regarding the multiple <length><bytearray> here - is each section a separate RPC message, or there is a section per region from a single regionserver?

{quote}Rewriting all hfiles? Pretty controversial I'd say.{quote}Is the idea to use Protocol Buffers to write the data blocks in the HFiles?  That seems like a performance problem.  Or just the metadata like FixedFileTrailer?

{quote}I would really prefer not to double the number of kV types just to say "foo with tags". And then double again for "foo with tags and bar".{quote}That would be ugly, but at the same time it's difficult and maybe wasteful to future-proof it from every angle.  Tags are already sort of a flexible future-proofing mechanism.  Maybe tags can be added in a backwards compatible way to the existing encoders.  I'd have to think about it for PrefixTree, probably punting them to a PREFIX_TREE2 encoder with some other additions/improvements.
                
      was (Author: mcorgan):
    {quote}So, set a pb header and then write out <length><bytearray> as we have now after we send the pb.{quote}That's what I was thinking, except now we send the Cells through an official DataBlockEncoder to generate the <bytearray> rather than using the custom KeyValue serializer in use right now.  We can make a new DataBlockEncoder that mimics the byte[] output of the current RPC format so it has roughly the same performance as the current client.

{quote}It won't be evolvable, right?  Unless we put a 'version' in the pb header or client{quote}We could put a version in the PB header.{quote}Probably safe to put a version in the header even if it never gets used.  I also have a version in the internal PrefixTree encoder, but an extra version byte here or there doesn't hurt anything.

{quote}It'd write <length><bytearray><length><bytearray> and the byte array would be the backing array of a KV?{quote}Regarding the multiple <length><bytearray> here - is each section a separate RPC message, or there is a section per region from a single regionserver?

{quote}Rewriting all hfiles? Pretty controversial I'd say.{quote}Is the idea to use Protocol Buffers to write the data blocks in the HFiles?  That seems like a performance problem.  Or just the metadata like FixedFileTrailer?

{quote}I would really prefer not to double the number of kV types just to say "foo with tags". And then double again for "foo with tags and bar".{quote}That would be ugly, but at the same time it's difficult and maybe wasteful to future-proof it from every angle.  Tags are already sort of a flexible future-proofing mechanism.  Maybe tags can be added in a backwards compatible way to the existing encoders.  I'd have to think about it for PrefixTree, probably punting them to a PREFIX_TREE2 encoder with some other additions/improvements.
                  
> Serializing KeyValues
> ---------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira