You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2012/11/28 21:47:58 UTC

[jira] [Created] (HBASE-7233) Remove Writable Interface from KeyValue

stack created HBASE-7233:
----------------------------

             Summary: Remove Writable Interface from KeyValue
                 Key: HBASE-7233
                 URL: https://issues.apache.org/jira/browse/HBASE-7233
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack
            Priority: Blocker


Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508115#comment-13508115 ] 

stack commented on HBASE-7233:
------------------------------

Or, I wonder, maybe we should go for Cell now.  In pb, we should have Cell rather than KV and we should call out the KV particles -- row, family, etc. -- that would align w/ how Cell works.  Yeah, less efficient but we ain't doing much KV pb serializing it seems (Whats happening inside in Result?  We not using the KV protobuf there?)
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507441#comment-13507441 ] 

stack commented on HBASE-7233:
------------------------------

[~mcorgan] How you mean (re: -ROOT- and .META.)?  Should be ok given we are protobuf serializing here already.  This patch comes after that work.  Let me check KeyValueSortReducer.  I don't see us specifying a Serializer on cursory glance.  Will run the unit test.

[~ted_yu] Thanks.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511254#comment-13511254 ] 

Andrew Purtell commented on HBASE-7233:
---------------------------------------

bq. We'd package the kv appropriately... version1 if that was what they asked for.  If they asked for version2, they'd get Andrew's tags if any specified?

On disk encoding. The tags should be serialized with the KV, inline, so can be read with the KV data in the same read op. 

What I'm doing now, for backwards compatibility, is write the value length as negative integer to flag the presence of tags and store the tags pretended to user data as part of the value section of the KV. It's ugly. Or, as mentioned, I store tags distinct from their associated KVs as KVs in a shadow column family. Especially when you up Blockcache pressure you can see a significant latency penalty on gets for the latter. Putting tags inline seems wise. How to get them in? Or, what about future evolution of KV? I would really prefer not to double the number of kV types just to say "foo with tags". And then double again for "foo with tags and bar".
                
> Serializing KeyValues
> ---------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508079#comment-13508079 ] 

stack commented on HBASE-7233:
------------------------------

bq. (as said on the mailing list, if KVs keeps Writable interface I'll still be happy)

I see that now.  I missed it because I was too busy stripping Writables this week in reaction to your rant.

It looks like it will take little to undo KeyValue and Writable so lets press ahead.  It undoes another piece of the Writable contamination and it can only help with our move to the Cell Interface having serialization disentangled.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511051#comment-13511051 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

bq. As I see it then, we'll send a pb Result and then on the wire, it'll be directly followed by an encoded block of KVs.
That makes sense. Would need to be extremely careful to still have wire compatibility. I.e. when a new serialization format comes along for the KV block, we cannot just send the new encoding along (even when announced in the header), the other side would not know what to do with it.

bq. except when doing secure connection.. there we need to sasl wrap the byte array response
That's interesting. Is there no way around this?

We could use a GatheringByteChannel and then assemble the response piecemeal.

                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507170#comment-13507170 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

Need to watch our step with META and ROOT cells too until we figure that out
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508099#comment-13508099 ] 

stack commented on HBASE-7233:
------------------------------

bq. Looking at the KeyValue in HBase.proto, though, it is not used anywhere, also it seems to required to disassemble the KV into Row/CF/Qual, which will be inefficient.

Well, if the KV atom is broken up into its particles, I'd think we'll be able to migrate its format over time.

Or, since KV is coming to its end of life, just redo its pb format so serialization is kv bytes -- 'more efficient' -- and we'll do serialization differently when we move over to the Cell Interface.

On the MR code not having to change because KeyValueSerialization will encapsulate KV protos, thats better still.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510802#comment-13510802 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

Most of the ProtocolBuffer uses are not performance critical and PB gives great flexibility and a well-known paradigm, but sending big chunks of Cells over the wire as fast as possible in a long scan is worth a special case i'd say.  Using the DataBlockEncoding stuff might consume roughly the same cpu as PB encoding on the server, but will save a ton of network bandwith for many tables and would be much easier for the client to decode.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7233) Serializing KeyValues

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-7233:
-------------------------

    Summary: Serializing KeyValues  (was: Serializing KeyValues when passing them over RPC)
    
> Serializing KeyValues
> ---------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508077#comment-13508077 ] 

stack commented on HBASE-7233:
------------------------------

There is one in hbase.protos already:


{code}
 28 /**
 27  * The type of the key in a KeyValue.
 26  */
 25 enum KeyType {
 24     MINIMUM = 0;
 23     PUT = 4;
 22 
 21     DELETE = 8;
 20     DELETE_COLUMN = 12;
 19     DELETE_FAMILY = 14;
 18 
 17     // MAXIMUM is used when searching; you look from maximum on down.
 16     MAXIMUM = 255;
 15 }
 14 
 13 /**
 12  * Protocol buffer version of KeyValue.
 11  * It doesn't have those transient parameters
 10  */
  9 message KeyValue {
  8   required bytes row = 1;
  7   required bytes family = 2;
  6   required bytes qualifier = 3;
  5   optional uint64 timestamp = 4;
  4   optional KeyType keyType = 5;
  3   optional bytes value = 6;
  2 }
{code}

Are you suggesting that we change KeyValueSortReducer from:

{code}
public class KeyValueSortReducer extends Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, KeyValue> {
{code}

to

{code}
public class KeyValueSortReducer extends Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, HBaseProtos.KeyValue> {
                                                                                                           ^^^^^^^^^^^^^^^^^^^^^
{code}

HBaseProtos.KeyValue implements https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/GeneratedMessage

This has what I started to list in the above proposed Interface including writeTo and writeDelimitedTo, etc.

I think this a good idea.  No pollution of KV or Cell w/ serialization.

Let me add it.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-7233:
-------------------------

    Attachment: 7233.txt

HBASE-1379 added Writable Interface to KV.

This patch removes it.

In WALEdit it does a bit of placeholding till we convert WALEdit from being a Writable.  Need to also chat w/ Matt Corgan after he is done drinking his Champagne about how we'll do serizliation/deserialization of KVs/Cells in his new Interface
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511233#comment-13511233 ] 

stack commented on HBASE-7233:
------------------------------

bq. I like the idea of KeyValue encoder.

It'd write <length><bytearray><length><bytearray> and the byte array would be the backing array of a KV?  The format version would be in the pb preamble.  Client would volunteer what it could digest.  We'd package the kv appropriately... version1 if that was what they asked for.  If they asked for version2, they'd get Andrew's tags if any specified?

A step above this would be a datablock encoder for sending lots of KVs in a compact form.

bq. How controversial is this?

Rewriting all hfiles?  Pretty controversial I'd say.  Maybe you were talking about how tricky versioning KV is?

Changed title of issue.  Moved its original intent, removing Writable from KV to HBASE-7289


                
> Serializing KeyValues
> ---------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511156#comment-13511156 ] 

stack commented on HBASE-7233:
------------------------------

Yeah, will have to keep versions on datablockencoding.

Clients other than hbase clients will be pretty hosed; if they are doing pure pb, hbase will be dog slow marshaling and unmarshaling, and if they want to go faster, they'll have to implement datablockencoding in whatever their language.

Looking, avro would let us pass schema independent of data -- say at connection setup -- and because schema is external, could have tight on the wire representation.  It lets you stream too it seems (haven't looked in code).  Thrift supposedly too.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507169#comment-13507169 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

I'm not up to speed on hbase/map-reduce integration.  Will it still work ok with the KeyValueSortReducer?

Otherwise, it's pretty easy to mimic the writable format within hbase.  There's some methods in KeyValueTool that take a Cell parameter and write the KeyValue format bytes to ByteBuffers and arrays.  Easy to add more for OutputStreams, etc
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511220#comment-13511220 ] 

stack commented on HBASE-7233:
------------------------------

bq. we can make a KEY_VALUE encoder that serializes cells in the current wire format which is pretty simple for other languages to parse. it can be a slightly more performant fallback than per-field protocol buffers

So, set a pb header and then write out <length><bytearray> as we have now after we send the pb.  It won't be evolvable, right?  Unless we put a 'version' in the pb header or client I suppose could say what version of this it wants and server would accomodate?
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507876#comment-13507876 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

Why not add a protobuf representation for KV? Could be just a byte[] (right?) The generated class will have this interface.

(as said on the mailing list, if KVs keeps Writable interface I'll still be happy)
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507517#comment-13507517 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

Nevermind about the ROOT/META comment.  I was thinking the comparator might have an effect here but maybe not.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507728#comment-13507728 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

[~mcorgan]
Came here to write that...
KVs are still used in HBase M/R, and a KV cannot currently be serialized by protobufs by itself (Mutation currently serialize the data as columns, not as contained KVs).
At the very least we need to add a KV serializer for M/R.

                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-7233:
--------------------------

    Attachment: 7233-v2.txt

Change in TestSerialization.java didn't apply cleanly.
Attached diff which fixes the above.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511171#comment-13511171 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

bq. Yeah, will have to keep versions on datablockencoding.
Will that be enough to have old clients talk to new server (or vice versa)? That's what Writable did, and it did not work so well. Client and Server have pre-negotiate what they understand?

                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511216#comment-13511216 ] 

Andrew Purtell commented on HBASE-7233:
---------------------------------------

Perhaps the title of this JIRA should be shortened to simply "Serializing KeyValues".

Using any of protobufs, Avro, or Thrift for marshalling/unmarshalling the KeyValue is unlikely to be viable, lots of object creation churn, small copies, this will kill performance. However sending a protobuf encoded prologue to a stream of KVs to a client makes sense.

I like the idea of KeyValue encoder.

I also like the idea of negotiating KeyValue encoder selection at connection setup time.

Beyond RPC, I've been looking at extending KeyValue to add tags as described in HBASE-6222. What I have is a "transitional approach". No matter what else happens here, if KeyValue could be a versioned serialization that would be great, we could introduce tags without overloading existing fields in ugly ways (e.g. writing a negative value length to indicate the presence of tags). Or, without storing tags physically distinct from their KVs in a separate shadow column. I have implementations that do both, the latter has some undesirable cost as you might imagine. Versioning KeyValue is tricky if we must be backwards compatible with existing data, if migration does not involve a HFile rewrite step. How controversial is this?
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510838#comment-13510838 ] 

stack commented on HBASE-7233:
------------------------------

As I see it then, we'll send a pb Result and then on the wire, it'll be directly followed by an encoded block of KVs.  The Result will describe the block that is coming immediately after.  Need to do same for Mutation sending in the data.

Hopefully, can doctor the rpc so I can get better access to the channel.  Currently we are composing the response in a bytebuffer that we give to a WritableByteChannel (this is after pb has done similar when we build the messages).  The composing of the response in a bytebuffer is a known temporary stopgap while moving to pb but we'll need to undo it before we ship (except when doing secure connection.. there we need to sasl wrap the byte array response).

Let me finish the baseline case where we do pure pb throughout.  Then will have a go at trying to send a follow-along encoded block.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508092#comment-13508092 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

Oh, didn't see the proto definition for KVs. In that case we only need to add a serializer (like MutationSerialization and ResultSerialization).
Lemme do that today or tomorrow.

The M/R code itself should not have to change.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511183#comment-13511183 ] 

stack commented on HBASE-7233:
------------------------------

bq. Will that be enough to have old clients talk to new server (or vice versa)? 

Should have said, new server would also have to be able to do the old datablockencoding formats too -- whatever the client proffered -- or else fall back to lowest common denominator pb all the time.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508416#comment-13508416 ] 

stack commented on HBASE-7233:
------------------------------

bq.  In Result we have pbs for Mutation, which serializes columns.

I don't see that.  I see:

{code}
 24 /**
 23  * For performance reason, we don't use KeyValue
 22  * here. We use the actual KeyValue bytes.
 21  */
 20 message Result {
 19   repeated bytes keyValueBytes = 1;
 18 }
{code}

... which looks like it would be a bit tough to evolve.  We should change this (Am I looking in wrong place?)

bq. In M/R jobs there will be massive amounts of KVs that are streamed from the mapper to the reducer, I do not think we want to introduce serialization that needs to copy each KV in its entirety before it can serialized.

Yeah.  Could do something particular for MR but can't be something that would stop our evolving KV/Cell over MR or over RPC.

Not sure how we'd do that though currently.  Maybe a Result has a CellInputStream into which we write the Cells (or whatever [~mcorgan] called it -- what was it Matt?  I don't seem to see it in committed hbase-common) and we give this blob to pb to serialize.... then then on other end we do the CellOutputStream reading them.....

bq. I.e. we'd have allow the pb to somehow know about the row array, row offset, and rowlength, as well as CF array/offset/length, qual arrary/offset/length, and value array/offset/length and be able to serialize the subportions of these arrays.

IIUC, not sure this possible. I think you are implying custom pb builder?  I may have you wrong.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510731#comment-13510731 ] 

stack commented on HBASE-7233:
------------------------------

Following up on our discussion from yesterday, we seemed to agree that hbase should ship w/ the KV pb'd broken up into its constituent elements so we can evolve KV over time.  The downside, as was voiced yesterday, is that we are pretty sure will make hbase horribly slow as we make copies of byte arrays as we add them to pb messages and then as pb composes the serialized version of the message to rpc (protobuf does not stream; see this protostuff page on pros/cons of pb: http://code.google.com/p/protostuff/wiki/ThingsYouNeedToKnow).

What was suggested yesterday was that client could say what it could accept and then the server would write alternatively dependent on what the client volunteered.

For example, protostuff supports streaming.  If client says it can do protostuff, then we'd do protostuff interchange.

Protostuff might not be what we'd want to move too though.  Avro does not seem to stream going by a cursory glance.  We could do custom serialization for the blob that comes after a pb header identifying what follows -- how it was serialized, what version, either size or a continuation flag, etc. -- and the blob could be a prefixtree'd blob whether a Result or a Put, etc.

Let me edit the subject on this issue.  Its scope is actually broader than that mentioned.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508174#comment-13508174 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

That'd be cool! In Result we have pbs for Mutation, which serializes columns.
In M/R jobs there will be *massive* amounts of KVs that are streamed from the mapper to the reducer, I do not think we want to introduce serialization that needs to copy each KV in its entirety before it can serialized.

If that pb can serialize subportions of an array we can use Cell now and still efficiently serialize KVs.

I.e. we'd have allow the pb to somehow know about the row array, row offset, and rowlength, as well as CF array/offset/length, qual arrary/offset/length, and value array/offset/length and be able to serialize the subportions of these arrays.
If we have this is no longer matters whether the data is stored in a single array or many.
(Just to state the obvious, that the whole idea behind the Cell interface).

                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511196#comment-13511196 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

few thoughts:
- we can make a KEY_VALUE encoder that serializes cells in the current wire format which is pretty simple for other languages to parse.  it can be a slightly more performant fallback than per-field protocol buffers
- encoders will have to be backwards compatible for a while on the server anyway because people have lots of hfiles encoded with them
- encoders could have versions, but they are also pretty intricate, so any changes might merit a whole new encoder like FAST_DIFF2
- the client could pass a short list of encoder options in decending order of preference like FAST_DIFF2, KEY_VALUE, PB, where PB is the forever-supported fallback

I'm a little skeptical that this will be the last client hbase ever supports.  If something really major changes, we could make a whole new client and the server could translate things to support the old client.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508097#comment-13508097 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

Looking at the KeyValue in HBase.proto, though, it is not used anywhere, also it seems to required to disassemble the KV into Row/CF/Qual, which will be inefficient.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-7233:
-------------------------

    Summary: Serializing KeyValues when passing them over RPC  (was: Remove Writable Interface from KeyValue)
    
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues when passing them over RPC

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510732#comment-13510732 ] 

Todd Lipcon commented on HBASE-7233:
------------------------------------

For the RPC transport, I'd vote that we reuse some of the "block encoder" type stuff that we've got in HFile. That way we get prefix compression on the transport of a list of KVs within RPC, which should improve performance.
                
> Serializing KeyValues when passing them over RPC
> ------------------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508471#comment-13508471 ] 

Lars Hofhansl commented on HBASE-7233:
--------------------------------------

You are right. I was looking at Mutate, which (for Put/Delete/Append) also serializes KVs, but there it does it column by column. Sigh.

Re: Custom PB. Specifically what I meant is something that can serialize a "Cell" that is composed of four separate byte[]'s (row/cf/qual/val) and then de-serialize as a single byte[], and vice versa (without copying the bytes - other than copying them into network buffers where necessary)
Maybe something can be done with union types or extensions.
(Ironically with Writables this would be a trivial problem to solve)

Cell[In|Out]PutStream would use PBs? If not, we're where we started :)

                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507868#comment-13507868 ] 

stack commented on HBASE-7233:
------------------------------

So, new Interface to replace Writable:

Interface HSerializable {
  // The serialization methods pb...
  mergeFrom(byte [])
  mergeFrom(InputStream)
  mergeDelimitedFrom(byte [])
  mergeDelimitedFrom(InputStream)
  parseFrom(byte [])
  parseFrom(InputStream)
  ...
  etc?
}
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Remove Writable Interface from KeyValue

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508973#comment-13508973 ] 

Matt Corgan commented on HBASE-7233:
------------------------------------

Not sure I follow everything so far, but I'm wondering if KeyValue should just keep the Writable interface since KeyValue is the unit of input/output in certain map-reduce jobs.  The Cell interface improves on KeyValue when you are passing around blobs of many Cells (since they can share common row-prefixes, etc), but for map-reduce we are passing around individual Cells, so might as well just keep using KeyValue.  The Cells need to be standalone, so KeyValue may be required.

Are there benefits to removing Writable for this particular class beyond cleaning up the code?  Maybe saving 4-8 bytes memory per KV in the memstore.
                
> Remove Writable Interface from KeyValue
> ---------------------------------------
>
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira