You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/12/06 20:03:09 UTC
[jira] [Comment Edited] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511611#comment-13511611 ]
Lars Hofhansl edited comment on HBASE-7233 at 12/6/12 7:02 PM:
---------------------------------------------------------------
{quote}So, set a pb header and then write out <length><bytearray> as we have now after we send the pb.{quote}That's what I was thinking, except now we send the Cells through an official DataBlockEncoder to generate the <bytearray> rather than using the custom KeyValue serializer in use right now. We can make a new DataBlockEncoder that mimics the byte[] output of the current RPC format so it has roughly the same performance as the current client.
{quote}It won't be evolvable, right? Unless we put a 'version' in the pb header or client{quote}We could put a version in the PB header. Probably safe to put a version in the header even if it never gets used. I also have a version in the internal PrefixTree encoder, but an extra version byte here or there doesn't hurt anything.
{quote}It'd write <length><bytearray><length><bytearray> and the byte array would be the backing array of a KV?{quote}Regarding the multiple <length><bytearray> here - is each section a separate RPC message, or there is a section per region from a single regionserver?
{quote}Rewriting all hfiles? Pretty controversial I'd say.{quote}Is the idea to use Protocol Buffers to write the data blocks in the HFiles? That seems like a performance problem. Or just the metadata like FixedFileTrailer?
{quote}I would really prefer not to double the number of kV types just to say "foo with tags". And then double again for "foo with tags and bar".{quote}That would be ugly, but at the same time it's difficult and maybe wasteful to future-proof it from every angle. Tags are already sort of a flexible future-proofing mechanism. Maybe tags can be added in a backwards compatible way to the existing encoders. I'd have to think about it for PrefixTree, probably punting them to a PREFIX_TREE2 encoder with some other additions/improvements.
was (Author: mcorgan):
{quote}So, set a pb header and then write out <length><bytearray> as we have now after we send the pb.{quote}That's what I was thinking, except now we send the Cells through an official DataBlockEncoder to generate the <bytearray> rather than using the custom KeyValue serializer in use right now. We can make a new DataBlockEncoder that mimics the byte[] output of the current RPC format so it has roughly the same performance as the current client.
{quote}It won't be evolvable, right? Unless we put a 'version' in the pb header or client{quote}We could put a version in the PB header.{quote}Probably safe to put a version in the header even if it never gets used. I also have a version in the internal PrefixTree encoder, but an extra version byte here or there doesn't hurt anything.
{quote}It'd write <length><bytearray><length><bytearray> and the byte array would be the backing array of a KV?{quote}Regarding the multiple <length><bytearray> here - is each section a separate RPC message, or there is a section per region from a single regionserver?
{quote}Rewriting all hfiles? Pretty controversial I'd say.{quote}Is the idea to use Protocol Buffers to write the data blocks in the HFiles? That seems like a performance problem. Or just the metadata like FixedFileTrailer?
{quote}I would really prefer not to double the number of kV types just to say "foo with tags". And then double again for "foo with tags and bar".{quote}That would be ugly, but at the same time it's difficult and maybe wasteful to future-proof it from every angle. Tags are already sort of a flexible future-proofing mechanism. Maybe tags can be added in a backwards compatible way to the existing encoders. I'd have to think about it for PrefixTree, probably punting them to a PREFIX_TREE2 encoder with some other additions/improvements.
> Serializing KeyValues
> ---------------------
>
> Key: HBASE-7233
> URL: https://issues.apache.org/jira/browse/HBASE-7233
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira