You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Nick Dimiduk <nd...@gmail.com> on 2013/02/21 19:35:27 UTC

Review request for HBASE-7692: Ordered byte[] serialization

Hi everyone,

I'm of the opinion that HBase should provide a mechanism for serializing
common java types such that the serialized format sorts according the
the natural ordering of the type. I think many application efforts end up
building a custom, partial implementation of this kind of functionality on
their own. I think HBase should provide a canonical implementation of such
a serialization format so that third-parties can reliably build on top of
HBase. Not just user applications, but other tools like Pig and Hive are
also enabled. Implementations for
HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
compatible with similar features in Pig.

After implementing something similar on multiple occasions, stumbled across
the Orderly <https://github.com/ndimiduk/orderly> library. It's also
appears to have been adopted by other large projects, including
Lily<https://github.com/NGDATA/orderly>.
I've engaged the library's author for some improvements only to find out
he's now at Google and will no longer be maintaining it. Thus, I propose we
take it into HBase.

HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692> includes a
patch that introduces Orderly into hbase-common under the orderly
namespace. I have an associated branch on
gihub<https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization>wherein
I've broken the patch out into multiple commits to ease review.
Please take a few minutes to give it a look.

Thanks,
Nick

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by lars hofhansl <la...@apache.org>.
I think we have to enable building stuff on top of HBase by having well defined building blocks as part of HBase.
It seems to me that a canonical supported byte representation for datatypes is such a building block.

-- Lars



________________________________
 From: Jonathan Hsieh <jo...@cloudera.com>
To: dev@hbase.apache.org 
Sent: Thursday, February 21, 2013 3:04 PM
Subject: Re: Review request for HBASE-7692: Ordered byte[] serialization
 
Nick,

While I believe having an order-preserving canonical serialization is a
good idea,  from doing a read of the mail and a skim of the jira it is not
clear to my why this is inside hbase as part of hbase-common.

Why isn't this part of a library on top of hbase (a dependency for
Pig/Hive) instead of "inside" hbase?
Can't this functionality be done just from the client level?
What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
methods to enforced the ordering?
If I HBase has two mutually incompatible encodings "built-in", how does a
dev know to use one or the other later on?
If this is essentially a mega import of a library (300k.. yikes) , why not
make it a separate module instead of part of common?

Jon.

On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi everyone,
>
> I'm of the opinion that HBase should provide a mechanism for serializing
> common java types such that the serialized format sorts according the
> the natural ordering of the type. I think many application efforts end up
> building a custom, partial implementation of this kind of functionality on
> their own. I think HBase should provide a canonical implementation of such
> a serialization format so that third-parties can reliably build on top of
> HBase. Not just user applications, but other tools like Pig and Hive are
> also enabled. Implementations for
> HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> compatible with similar features in Pig.
>
> After implementing something similar on multiple occasions, stumbled across
> the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> appears to have been adopted by other large projects, including
> Lily<https://github.com/NGDATA/orderly>.
> I've engaged the library's author for some improvements only to find out
> he's now at Google and will no longer be maintaining it. Thus, I propose we
> take it into HBase.
>
> HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692> includes a
> patch that introduces Orderly into hbase-common under the orderly
> namespace. I have an associated branch on
> gihub<https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> >wherein
> I've broken the patch out into multiple commits to ease review.
> Please take a few minutes to give it a look.
>
> Thanks,
> Nick
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Stack <st...@duboce.net>.
On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org> wrote:

> BTW, I also think that we need to have a SQL-type to java type to byte[]
> layer, but that is another discussion.
>

Say more Enis (either here or in a new thread).  It would just be types?
 Would it be in this Orderly module/package?

St.Ack

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Stack <st...@duboce.net>.
On Fri, Feb 22, 2013 at 6:04 PM, Matt Corgan <mc...@hotpads.com> wrote:

> All sounds fine to me Nick.  I had not looked into the internals enough to
> realize Builders were optional.
>
> Sorry if I'm looking too far down the road, but the future implications of
> including such low level building blocks could be hard to unwind.  Worth a
> little discussion at least.
>
>
Agreed.

Let me go of to look at the patch.  I third the Matt/Jon suggestion that
this be a jar we include if possible w/ no hbase dependency if possible and
if that is not possible, as a standalone module w/ no interdependency....
but then I also appreciate and would like to support your improvement to
the HBase API project Nick.  Let me go look at the patch.

Good stuff,
St.Ack

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Matt Corgan <mc...@hotpads.com>.
All sounds fine to me Nick.  I had not looked into the internals enough to
realize Builders were optional.

Sorry if I'm looking too far down the road, but the future implications of
including such low level building blocks could be hard to unwind.  Worth a
little discussion at least.



On Fri, Feb 22, 2013 at 5:40 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> I think we're getting ahead of ourselves a bit here. First and foremost,
> I'm looking for consensus that HBase should ship with tools for serializing
> Java primitive types such that the byte[] representations maintain sorted
> order. This is primarily to the benefit of users of HBase in that 3rd party
> tools can enjoy interoperability in so much as is provided by HBase (ie, I
> can write a Pig script that writes a long and my Hive queries can read that
> value). Furthermore, the implementations of these tools benefit from the
> order-preserving representation.
>
> Assuming this capacity is agreed to be desirable, I propose the adoption of
> this orphaned community library. I have no particular love for the name of
> the package, nor am I concerned terribly about which module it resides in.
> Personally, I think it should ship with (explicitly or as a dependency of)
> the hbase-client module that will exist in 0.96. This is my preference
> because I think the client API should be extended to use said serialization
> format directly -- finally, HBase could "support" types other than byte[].
> That would be a much larger change, however, and I am not interested in
> pressing it for this initial discussion.
>
> This introduction does not in any way affect the existing Bytes utility.
> Server components can continue to use it for marshaling their own
> primitives. This library is of interest primarily to consumers of the HBase
> client API. (I'd prefer to see Bytes deprecated from client use entirely!)
> I do not think this library or it's *optional* builder pattern should be
> used inside of the RegionServer. See also HBASE-7221 for another user who
> is asking for this kind of builder pattern. The Builder and Iterator utils
> are only a convince API, providing sugar on top of the underlying
> StructRowKey implementation. Users interested in producing or consuming
> compound objects within a tight loop need not bother with either of them.
>
> As for the implementation details and dependency on Hadoop Writables: it is
> my opinion that so long as its dependencies are compatible with the rest of
> HBase, it's no big deal. From that perspective, dependence on Hadoop
> Writable implementations is entirely reasonable for an initial inclusion.
> If, down the road, we wish to reduce dependencies (a practice I generally
> support) and in so doing it becomes useful to change this implementation
> detail, so be it. Say, for example, we want to release an hbase-client jar
> that has no dependency on any Hadoop types, I say go for it. The patch I
> have contribute tags all of these classes as "Evolving" interfaces, and
> nothing is set in stone until a release manager and the community bless a
> new release. I'm happy to work with whomever is interested toward
> modernizing implementation details once the initial code is in place.
>
> Finally, the multiple patches business is nothing more than a
> reviewer connivence. I'm generally not excited about reviewing more than
> about 20 files at a time, on Review Board or otherwise. I assume others
> share the same opinion. As I offered on the ticket itself, I'm fine with
> accepting review on Review Board on the single large patch; I assumed
> github would make it easier, not harder.
>
> Thanks for your attention.
> -n
>
> On Fri, Feb 22, 2013 at 4:48 PM, Matt Corgan <mc...@hotpads.com> wrote:
>
> > I agree with Jonathan that ideally this would not depend on hbase or
> > hadoop.  Could we just replace Hadoop's BytesWritable with a new class
> that
> > does the same thing?
> >
> > I also have a concern about the way it builds the multi-field byte[] by
> > allocating somewhat expensive Builder objects, etc.  It's suitable for
> > application level code, but most of the innards of hbase regionserver
> > should be using tighter code for best performance and less garbage.
> >  Perhaps in a future issue we can separate the builder wrappers from
> their
> > internal byte converters so that hbase-server can use the lower-level
> byte
> > converters without the builder overhead.
> >
> >
> > On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > I think I misspoke slightly but basically agree with Matt's notion that
> > > this would end up being the place to pickup the orderly jar and that
> > > ideally it has no hbase-* dependencies.
> > >
> > > I actually feel that the hbase-orderly module is a sibling to
> > hbase-common
> > > and hbase-client. My initial thought is that this is ideally not
> depended
> > > upon by the hbase-client.  An app would use hbase-orderly and
> > hbase-client.
> > >
> > >
> > >  A simplified module dependency graph (excluding some details) would be
> > > (where -> == "depends on")
> > >
> > > app -> hbase-client, hbase-orderly
> > > hbase-client -> hbase-protocol, hbase-common, *-compat
> > > hbase-common -> none of the hbase-*
> > > hbase-orderly -> none of the hbase-*
> > >
> > > I'm don't quite understand what the multiple patches are for the module
> > > work (or is this follow on stuff that uses this)?  can you explain what
> > the
> > > breakdown would be?  since it isn't committed yet and should be self
> > > contained, just do the big import as a single patch?
> > >
> > > Thanks for bring this up for discussion Nick.
> > >
> > > Jon.
> > >
> > > On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
> > > wrote:
> > > >
> > > > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > > > because
> > > > > hbase-client depends on hbase-common and hbase-protocol
> > > > >
> > > >
> > > > Actually, quite the contrary. I don't see this as being an external
> > > module
> > > > as much as integral to the client's use of HBase (read "client" as
> > > > "application consuming HBase", not "the HBase RPC client
> > > implementation").
> > > > Further, once HBase provides a suitable serialization format for
> > > > primitives, why not push them into the client API? IMHO, HBase really
> > > > should provide basic types for users at the Mutation layer. That,
> > > however,
> > > > belongs in an entirely separate ticket.
> > > >
> > > > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
> > > wrote:
> > > > >
> > > > > > Yep the client will be fully separated as soon as rpc changes
> > > > > > are stabilized.  Until then keeping up the move patch was just
> too
> > > > > onerous.
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <
> jon@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Nick,
> > > > > > >
> > > > > > > I'm +1 for it having its own module, and being a sibling of
> > > > > hbase-client.
> > > > > > >  I'm assuming the client stuff will happen before we release
> 0.96
> > > > since
> > > > > > it
> > > > > > > has been started.
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <
> > ndimiduk@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > You're absolutely correct: this library introduces
> client-side
> > > > > > > conventions
> > > > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > > > the consensus that it should reside in it's own module or be
> a
> > > > > sibling
> > > > > > to
> > > > > > > > the o.a.h.hbase.client source tree? I'm a little confused by
> > the
> > > > > > current
> > > > > > > > state of the modules; hbase-client looks empty while
> > > > > o.a.h.hbase.client
> > > > > > > > sits under hbase-server.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Nick
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> > > jon@cloudera.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So I buy the argument about this being included in hbase,
> but
> > > > > several
> > > > > > > of
> > > > > > > > > the questions still stand --
> > > > > > > > >
> > > > > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > > > > dependency
> > > > > > > of
> > > > > > > > > hbase-client module?  Does the hbase-server side need to
> > depend
> > > > on
> > > > > > > this?
> > > > > > > > >
> > > > > > > > > Since this is a large import of a currently isolated
> library,
> > > why
> > > > > not
> > > > > > > > make
> > > > > > > > > it a separate module instead of part of hbase-common?  This
> > > would
> > > > > > > > enforce a
> > > > > > > > > boundary that will prevent pollution from circular
> > > dependencies.
> > > > > > > > >
> > > > > > > > > Jon.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> > > enis@apache.org>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I think this belongs in core HBase, as a replacement to
> > > Bytes,
> > > > > > which
> > > > > > > > > should
> > > > > > > > > > be deprecated eventually. We have a Bytes utility which
> is
> > > > > supposed
> > > > > > > to
> > > > > > > > > > convert basic java types to byte[]'s, but it does not
> work
> > > for
> > > > > > signed
> > > > > > > > > > numbers.
> > > > > > > > > >
> > > > > > > > > > We already know that all of the clients, Hive, Pig,
> > Phoenix,
> > > > have
> > > > > > to
> > > > > > > > have
> > > > > > > > > > at least java type -> byte[] conversion utilities, and I
> > > think
> > > > it
> > > > > > is
> > > > > > > > > > HBase's job to supply one so that different clients can
> > > > > > interoperate.
> > > > > > > > > Since
> > > > > > > > > > internally we are also relying on serializing java types,
> > we
> > > > need
> > > > > > > that
> > > > > > > > > > library in the core.
> > > > > > > > > >
> > > > > > > > > > BTW, I also think that we need to have a SQL-type to java
> > > type
> > > > to
> > > > > > > > byte[]
> > > > > > > > > > layer, but that is another discussion.
> > > > > > > > > >
> > > > > > > > > > Enis
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > > > jon@cloudera.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Nick,
> > > > > > > > > > >
> > > > > > > > > > > While I believe having an order-preserving canonical
> > > > > > serialization
> > > > > > > > is a
> > > > > > > > > > > good idea,  from doing a read of the mail and a skim of
> > the
> > > > > jira
> > > > > > it
> > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > > clear to my why this is inside hbase as part of
> > > hbase-common.
> > > > > > > > > > >
> > > > > > > > > > > Why isn't this part of a library on top of hbase (a
> > > > dependency
> > > > > > for
> > > > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > > > Can't this functionality be done just from the client
> > > level?
> > > > > > > > > > > What's the end goal hee? Is the goal here to replace
> the
> > > > > > > > > Bytes.toBytes(*)
> > > > > > > > > > > methods to enforced the ordering?
> > > > > > > > > > > If I HBase has two mutually incompatible encodings
> > > > "built-in",
> > > > > > how
> > > > > > > > > does a
> > > > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > > > If this is essentially a mega import of a library
> (300k..
> > > > > yikes)
> > > > > > ,
> > > > > > > > why
> > > > > > > > > > not
> > > > > > > > > > > make it a separate module instead of part of common?
> > > > > > > > > > >
> > > > > > > > > > > Jon.
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > > > > ndimiduk@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > I'm of the opinion that HBase should provide a
> > mechanism
> > > > for
> > > > > > > > > > serializing
> > > > > > > > > > > > common java types such that the serialized format
> sorts
> > > > > > according
> > > > > > > > the
> > > > > > > > > > > > the natural ordering of the type. I think many
> > > application
> > > > > > > efforts
> > > > > > > > > end
> > > > > > > > > > up
> > > > > > > > > > > > building a custom, partial implementation of this
> kind
> > of
> > > > > > > > > functionality
> > > > > > > > > > > on
> > > > > > > > > > > > their own. I think HBase should provide a canonical
> > > > > > > implementation
> > > > > > > > of
> > > > > > > > > > > such
> > > > > > > > > > > > a serialization format so that third-parties can
> > reliably
> > > > > build
> > > > > > > on
> > > > > > > > > top
> > > > > > > > > > of
> > > > > > > > > > > > HBase. Not just user applications, but other tools
> like
> > > Pig
> > > > > and
> > > > > > > > Hive
> > > > > > > > > > are
> > > > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > > > HIVE-3634<
> > > https://issues.apache.org/jira/browse/HIVE-3634
> > > > >,
> > > > > > > > > > > > HIVE-2599 <
> > > https://issues.apache.org/jira/browse/HIVE-2599
> > > > >,
> > > > > > or
> > > > > > > > > > > > HIVE-2903<
> > > https://issues.apache.org/jira/browse/HIVE-2903
> > > > > > >could
> > > > > > > be
> > > > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > > > >
> > > > > > > > > > > > After implementing something similar on multiple
> > > occasions,
> > > > > > > > stumbled
> > > > > > > > > > > across
> > > > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> > > library.
> > > > > > It's
> > > > > > > > also
> > > > > > > > > > > > appears to have been adopted by other large projects,
> > > > > including
> > > > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > > > I've engaged the library's author for some
> improvements
> > > > only
> > > > > to
> > > > > > > > find
> > > > > > > > > > out
> > > > > > > > > > > > he's now at Google and will no longer be maintaining
> > it.
> > > > > Thus,
> > > > > > I
> > > > > > > > > > propose
> > > > > > > > > > > we
> > > > > > > > > > > > take it into HBase.
> > > > > > > > > > > >
> > > > > > > > > > > > HBASE-7692 <
> > > > https://issues.apache.org/jira/browse/HBASE-7692
> > > > > >
> > > > > > > > > > includes a
> > > > > > > > > > > > patch that introduces Orderly into hbase-common under
> > the
> > > > > > orderly
> > > > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > > > gihub<
> > > > > > > > > > >
> > > > > > >
> > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > > > >wherein
> > > > > > > > > > > > I've broken the patch out into multiple commits to
> ease
> > > > > review.
> > > > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Nick
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > > > // jon@cloudera.com
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > // jon@cloudera.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Stack <st...@duboce.net>.
On Fri, Feb 22, 2013 at 5:40 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> I think we're getting ahead of ourselves a bit here. First and foremost,
> I'm looking for consensus that HBase should ship with tools for serializing
> Java primitive types such that the byte[] representations maintain sorted
> order. This is primarily to the benefit of users of HBase in that 3rd party
> tools can enjoy interoperability in so much as is provided by HBase (ie, I
> can write a Pig script that writes a long and my Hive queries can read that
> value). Furthermore, the implementations of these tools benefit from the
> order-preserving representation.
>
> Assuming this capacity is agreed to be desirable, I propose the adoption of
> this orphaned community library. I have no particular love for the name of
> the package, nor am I concerned terribly about which module it resides in.
> Personally, I think it should ship with (explicitly or as a dependency of)
> the hbase-client module that will exist in 0.96. This is my preference
> because I think the client API should be extended to use said serialization
> format directly -- finally, HBase could "support" types other than byte[].
> That would be a much larger change, however, and I am not interested in
> pressing it for this initial discussion.
>
> This introduction does not in any way affect the existing Bytes utility.
> Server components can continue to use it for marshaling their own
> primitives. This library is of interest primarily to consumers of the HBase
> client API. (I'd prefer to see Bytes deprecated from client use entirely!)
> I do not think this library or it's *optional* builder pattern should be
> used inside of the RegionServer. See also HBASE-7221 for another user who
> is asking for this kind of builder pattern. The Builder and Iterator utils
> are only a convince API, providing sugar on top of the underlying
> StructRowKey implementation. Users interested in producing or consuming
> compound objects within a tight loop need not bother with either of them.
>
> As for the implementation details and dependency on Hadoop Writables: it is
> my opinion that so long as its dependencies are compatible with the rest of
> HBase, it's no big deal. From that perspective, dependence on Hadoop
> Writable implementations is entirely reasonable for an initial inclusion.
> If, down the road, we wish to reduce dependencies (a practice I generally
> support) and in so doing it becomes useful to change this implementation
> detail, so be it. Say, for example, we want to release an hbase-client jar
> that has no dependency on any Hadoop types, I say go for it. The patch I
> have contribute tags all of these classes as "Evolving" interfaces, and
> nothing is set in stone until a release manager and the community bless a
> new release. I'm happy to work with whomever is interested toward
> modernizing implementation details once the initial code is in place.
>
> Finally, the multiple patches business is nothing more than a
> reviewer connivence. I'm generally not excited about reviewing more than
> about 20 files at a time, on Review Board or otherwise. I assume others
> share the same opinion. As I offered on the ticket itself, I'm fine with
> accepting review on Review Board on the single large patch; I assumed
> github would make it easier, not harder.
>
> Thanks for your attention.


Thanks for the nice write up Mr. Nick.
St.Ack

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
I think we're getting ahead of ourselves a bit here. First and foremost,
I'm looking for consensus that HBase should ship with tools for serializing
Java primitive types such that the byte[] representations maintain sorted
order. This is primarily to the benefit of users of HBase in that 3rd party
tools can enjoy interoperability in so much as is provided by HBase (ie, I
can write a Pig script that writes a long and my Hive queries can read that
value). Furthermore, the implementations of these tools benefit from the
order-preserving representation.

Assuming this capacity is agreed to be desirable, I propose the adoption of
this orphaned community library. I have no particular love for the name of
the package, nor am I concerned terribly about which module it resides in.
Personally, I think it should ship with (explicitly or as a dependency of)
the hbase-client module that will exist in 0.96. This is my preference
because I think the client API should be extended to use said serialization
format directly -- finally, HBase could "support" types other than byte[].
That would be a much larger change, however, and I am not interested in
pressing it for this initial discussion.

This introduction does not in any way affect the existing Bytes utility.
Server components can continue to use it for marshaling their own
primitives. This library is of interest primarily to consumers of the HBase
client API. (I'd prefer to see Bytes deprecated from client use entirely!)
I do not think this library or it's *optional* builder pattern should be
used inside of the RegionServer. See also HBASE-7221 for another user who
is asking for this kind of builder pattern. The Builder and Iterator utils
are only a convince API, providing sugar on top of the underlying
StructRowKey implementation. Users interested in producing or consuming
compound objects within a tight loop need not bother with either of them.

As for the implementation details and dependency on Hadoop Writables: it is
my opinion that so long as its dependencies are compatible with the rest of
HBase, it's no big deal. From that perspective, dependence on Hadoop
Writable implementations is entirely reasonable for an initial inclusion.
If, down the road, we wish to reduce dependencies (a practice I generally
support) and in so doing it becomes useful to change this implementation
detail, so be it. Say, for example, we want to release an hbase-client jar
that has no dependency on any Hadoop types, I say go for it. The patch I
have contribute tags all of these classes as "Evolving" interfaces, and
nothing is set in stone until a release manager and the community bless a
new release. I'm happy to work with whomever is interested toward
modernizing implementation details once the initial code is in place.

Finally, the multiple patches business is nothing more than a
reviewer connivence. I'm generally not excited about reviewing more than
about 20 files at a time, on Review Board or otherwise. I assume others
share the same opinion. As I offered on the ticket itself, I'm fine with
accepting review on Review Board on the single large patch; I assumed
github would make it easier, not harder.

Thanks for your attention.
-n

On Fri, Feb 22, 2013 at 4:48 PM, Matt Corgan <mc...@hotpads.com> wrote:

> I agree with Jonathan that ideally this would not depend on hbase or
> hadoop.  Could we just replace Hadoop's BytesWritable with a new class that
> does the same thing?
>
> I also have a concern about the way it builds the multi-field byte[] by
> allocating somewhat expensive Builder objects, etc.  It's suitable for
> application level code, but most of the innards of hbase regionserver
> should be using tighter code for best performance and less garbage.
>  Perhaps in a future issue we can separate the builder wrappers from their
> internal byte converters so that hbase-server can use the lower-level byte
> converters without the builder overhead.
>
>
> On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > I think I misspoke slightly but basically agree with Matt's notion that
> > this would end up being the place to pickup the orderly jar and that
> > ideally it has no hbase-* dependencies.
> >
> > I actually feel that the hbase-orderly module is a sibling to
> hbase-common
> > and hbase-client. My initial thought is that this is ideally not depended
> > upon by the hbase-client.  An app would use hbase-orderly and
> hbase-client.
> >
> >
> >  A simplified module dependency graph (excluding some details) would be
> > (where -> == "depends on")
> >
> > app -> hbase-client, hbase-orderly
> > hbase-client -> hbase-protocol, hbase-common, *-compat
> > hbase-common -> none of the hbase-*
> > hbase-orderly -> none of the hbase-*
> >
> > I'm don't quite understand what the multiple patches are for the module
> > work (or is this follow on stuff that uses this)?  can you explain what
> the
> > breakdown would be?  since it isn't committed yet and should be self
> > contained, just do the big import as a single patch?
> >
> > Thanks for bring this up for discussion Nick.
> >
> > Jon.
> >
> > On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >
> > > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
> > wrote:
> > >
> > > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > > because
> > > > hbase-client depends on hbase-common and hbase-protocol
> > > >
> > >
> > > Actually, quite the contrary. I don't see this as being an external
> > module
> > > as much as integral to the client's use of HBase (read "client" as
> > > "application consuming HBase", not "the HBase RPC client
> > implementation").
> > > Further, once HBase provides a suitable serialization format for
> > > primitives, why not push them into the client API? IMHO, HBase really
> > > should provide basic types for users at the Mutation layer. That,
> > however,
> > > belongs in an entirely separate ticket.
> > >
> > > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
> > wrote:
> > > >
> > > > > Yep the client will be fully separated as soon as rpc changes
> > > > > are stabilized.  Until then keeping up the move patch was just too
> > > > onerous.
> > > > >
> > > > >
> > > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> > > > wrote:
> > > > >
> > > > > > Nick,
> > > > > >
> > > > > > I'm +1 for it having its own module, and being a sibling of
> > > > hbase-client.
> > > > > >  I'm assuming the client stuff will happen before we release 0.96
> > > since
> > > > > it
> > > > > > has been started.
> > > > > >
> > > > > > Jon.
> > > > > >
> > > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <
> ndimiduk@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > You're absolutely correct: this library introduces client-side
> > > > > > conventions
> > > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > > the consensus that it should reside in it's own module or be a
> > > > sibling
> > > > > to
> > > > > > > the o.a.h.hbase.client source tree? I'm a little confused by
> the
> > > > > current
> > > > > > > state of the modules; hbase-client looks empty while
> > > > o.a.h.hbase.client
> > > > > > > sits under hbase-server.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Nick
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> > jon@cloudera.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > So I buy the argument about this being included in hbase, but
> > > > several
> > > > > > of
> > > > > > > > the questions still stand --
> > > > > > > >
> > > > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > > > dependency
> > > > > > of
> > > > > > > > hbase-client module?  Does the hbase-server side need to
> depend
> > > on
> > > > > > this?
> > > > > > > >
> > > > > > > > Since this is a large import of a currently isolated library,
> > why
> > > > not
> > > > > > > make
> > > > > > > > it a separate module instead of part of hbase-common?  This
> > would
> > > > > > > enforce a
> > > > > > > > boundary that will prevent pollution from circular
> > dependencies.
> > > > > > > >
> > > > > > > > Jon.
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> > enis@apache.org>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I think this belongs in core HBase, as a replacement to
> > Bytes,
> > > > > which
> > > > > > > > should
> > > > > > > > > be deprecated eventually. We have a Bytes utility which is
> > > > supposed
> > > > > > to
> > > > > > > > > convert basic java types to byte[]'s, but it does not work
> > for
> > > > > signed
> > > > > > > > > numbers.
> > > > > > > > >
> > > > > > > > > We already know that all of the clients, Hive, Pig,
> Phoenix,
> > > have
> > > > > to
> > > > > > > have
> > > > > > > > > at least java type -> byte[] conversion utilities, and I
> > think
> > > it
> > > > > is
> > > > > > > > > HBase's job to supply one so that different clients can
> > > > > interoperate.
> > > > > > > > Since
> > > > > > > > > internally we are also relying on serializing java types,
> we
> > > need
> > > > > > that
> > > > > > > > > library in the core.
> > > > > > > > >
> > > > > > > > > BTW, I also think that we need to have a SQL-type to java
> > type
> > > to
> > > > > > > byte[]
> > > > > > > > > layer, but that is another discussion.
> > > > > > > > >
> > > > > > > > > Enis
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > > jon@cloudera.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Nick,
> > > > > > > > > >
> > > > > > > > > > While I believe having an order-preserving canonical
> > > > > serialization
> > > > > > > is a
> > > > > > > > > > good idea,  from doing a read of the mail and a skim of
> the
> > > > jira
> > > > > it
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > clear to my why this is inside hbase as part of
> > hbase-common.
> > > > > > > > > >
> > > > > > > > > > Why isn't this part of a library on top of hbase (a
> > > dependency
> > > > > for
> > > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > > Can't this functionality be done just from the client
> > level?
> > > > > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > > > > Bytes.toBytes(*)
> > > > > > > > > > methods to enforced the ordering?
> > > > > > > > > > If I HBase has two mutually incompatible encodings
> > > "built-in",
> > > > > how
> > > > > > > > does a
> > > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > > If this is essentially a mega import of a library (300k..
> > > > yikes)
> > > > > ,
> > > > > > > why
> > > > > > > > > not
> > > > > > > > > > make it a separate module instead of part of common?
> > > > > > > > > >
> > > > > > > > > > Jon.
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > > > ndimiduk@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > I'm of the opinion that HBase should provide a
> mechanism
> > > for
> > > > > > > > > serializing
> > > > > > > > > > > common java types such that the serialized format sorts
> > > > > according
> > > > > > > the
> > > > > > > > > > > the natural ordering of the type. I think many
> > application
> > > > > > efforts
> > > > > > > > end
> > > > > > > > > up
> > > > > > > > > > > building a custom, partial implementation of this kind
> of
> > > > > > > > functionality
> > > > > > > > > > on
> > > > > > > > > > > their own. I think HBase should provide a canonical
> > > > > > implementation
> > > > > > > of
> > > > > > > > > > such
> > > > > > > > > > > a serialization format so that third-parties can
> reliably
> > > > build
> > > > > > on
> > > > > > > > top
> > > > > > > > > of
> > > > > > > > > > > HBase. Not just user applications, but other tools like
> > Pig
> > > > and
> > > > > > > Hive
> > > > > > > > > are
> > > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > > HIVE-3634<
> > https://issues.apache.org/jira/browse/HIVE-3634
> > > >,
> > > > > > > > > > > HIVE-2599 <
> > https://issues.apache.org/jira/browse/HIVE-2599
> > > >,
> > > > > or
> > > > > > > > > > > HIVE-2903<
> > https://issues.apache.org/jira/browse/HIVE-2903
> > > > > >could
> > > > > > be
> > > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > > >
> > > > > > > > > > > After implementing something similar on multiple
> > occasions,
> > > > > > > stumbled
> > > > > > > > > > across
> > > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> > library.
> > > > > It's
> > > > > > > also
> > > > > > > > > > > appears to have been adopted by other large projects,
> > > > including
> > > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > > I've engaged the library's author for some improvements
> > > only
> > > > to
> > > > > > > find
> > > > > > > > > out
> > > > > > > > > > > he's now at Google and will no longer be maintaining
> it.
> > > > Thus,
> > > > > I
> > > > > > > > > propose
> > > > > > > > > > we
> > > > > > > > > > > take it into HBase.
> > > > > > > > > > >
> > > > > > > > > > > HBASE-7692 <
> > > https://issues.apache.org/jira/browse/HBASE-7692
> > > > >
> > > > > > > > > includes a
> > > > > > > > > > > patch that introduces Orderly into hbase-common under
> the
> > > > > orderly
> > > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > > gihub<
> > > > > > > > > >
> > > > > >
> > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > > >wherein
> > > > > > > > > > > I've broken the patch out into multiple commits to ease
> > > > review.
> > > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Nick
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > > // jon@cloudera.com
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > // Software Engineer, Cloudera
> > > > > > > > // jon@cloudera.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > // Jonathan Hsieh (shay)
> > > > > > // Software Engineer, Cloudera
> > > > > > // jon@cloudera.com
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Matt Corgan <mc...@hotpads.com>.
I agree with Jonathan that ideally this would not depend on hbase or
hadoop.  Could we just replace Hadoop's BytesWritable with a new class that
does the same thing?

I also have a concern about the way it builds the multi-field byte[] by
allocating somewhat expensive Builder objects, etc.  It's suitable for
application level code, but most of the innards of hbase regionserver
should be using tighter code for best performance and less garbage.
 Perhaps in a future issue we can separate the builder wrappers from their
internal byte converters so that hbase-server can use the lower-level byte
converters without the builder overhead.


On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> I think I misspoke slightly but basically agree with Matt's notion that
> this would end up being the place to pickup the orderly jar and that
> ideally it has no hbase-* dependencies.
>
> I actually feel that the hbase-orderly module is a sibling to hbase-common
> and hbase-client. My initial thought is that this is ideally not depended
> upon by the hbase-client.  An app would use hbase-orderly and hbase-client.
>
>
>  A simplified module dependency graph (excluding some details) would be
> (where -> == "depends on")
>
> app -> hbase-client, hbase-orderly
> hbase-client -> hbase-protocol, hbase-common, *-compat
> hbase-common -> none of the hbase-*
> hbase-orderly -> none of the hbase-*
>
> I'm don't quite understand what the multiple patches are for the module
> work (or is this follow on stuff that uses this)?  can you explain what the
> breakdown would be?  since it isn't committed yet and should be self
> contained, just do the big import as a single patch?
>
> Thanks for bring this up for discussion Nick.
>
> Jon.
>
> On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
> wrote:
> >
> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > because
> > > hbase-client depends on hbase-common and hbase-protocol
> > >
> >
> > Actually, quite the contrary. I don't see this as being an external
> module
> > as much as integral to the client's use of HBase (read "client" as
> > "application consuming HBase", not "the HBase RPC client
> implementation").
> > Further, once HBase provides a suitable serialization format for
> > primitives, why not push them into the client API? IMHO, HBase really
> > should provide basic types for users at the Mutation layer. That,
> however,
> > belongs in an entirely separate ticket.
> >
> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
> wrote:
> > >
> > > > Yep the client will be fully separated as soon as rpc changes
> > > > are stabilized.  Until then keeping up the move patch was just too
> > > onerous.
> > > >
> > > >
> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > Nick,
> > > > >
> > > > > I'm +1 for it having its own module, and being a sibling of
> > > hbase-client.
> > > > >  I'm assuming the client stuff will happen before we release 0.96
> > since
> > > > it
> > > > > has been started.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > You're absolutely correct: this library introduces client-side
> > > > > conventions
> > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > the consensus that it should reside in it's own module or be a
> > > sibling
> > > > to
> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > > > current
> > > > > > state of the modules; hbase-client looks empty while
> > > o.a.h.hbase.client
> > > > > > sits under hbase-server.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> jon@cloudera.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > So I buy the argument about this being included in hbase, but
> > > several
> > > > > of
> > > > > > > the questions still stand --
> > > > > > >
> > > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > > dependency
> > > > > of
> > > > > > > hbase-client module?  Does the hbase-server side need to depend
> > on
> > > > > this?
> > > > > > >
> > > > > > > Since this is a large import of a currently isolated library,
> why
> > > not
> > > > > > make
> > > > > > > it a separate module instead of part of hbase-common?  This
> would
> > > > > > enforce a
> > > > > > > boundary that will prevent pollution from circular
> dependencies.
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> enis@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > I think this belongs in core HBase, as a replacement to
> Bytes,
> > > > which
> > > > > > > should
> > > > > > > > be deprecated eventually. We have a Bytes utility which is
> > > supposed
> > > > > to
> > > > > > > > convert basic java types to byte[]'s, but it does not work
> for
> > > > signed
> > > > > > > > numbers.
> > > > > > > >
> > > > > > > > We already know that all of the clients, Hive, Pig, Phoenix,
> > have
> > > > to
> > > > > > have
> > > > > > > > at least java type -> byte[] conversion utilities, and I
> think
> > it
> > > > is
> > > > > > > > HBase's job to supply one so that different clients can
> > > > interoperate.
> > > > > > > Since
> > > > > > > > internally we are also relying on serializing java types, we
> > need
> > > > > that
> > > > > > > > library in the core.
> > > > > > > >
> > > > > > > > BTW, I also think that we need to have a SQL-type to java
> type
> > to
> > > > > > byte[]
> > > > > > > > layer, but that is another discussion.
> > > > > > > >
> > > > > > > > Enis
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > jon@cloudera.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Nick,
> > > > > > > > >
> > > > > > > > > While I believe having an order-preserving canonical
> > > > serialization
> > > > > > is a
> > > > > > > > > good idea,  from doing a read of the mail and a skim of the
> > > jira
> > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > clear to my why this is inside hbase as part of
> hbase-common.
> > > > > > > > >
> > > > > > > > > Why isn't this part of a library on top of hbase (a
> > dependency
> > > > for
> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > Can't this functionality be done just from the client
> level?
> > > > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > > > Bytes.toBytes(*)
> > > > > > > > > methods to enforced the ordering?
> > > > > > > > > If I HBase has two mutually incompatible encodings
> > "built-in",
> > > > how
> > > > > > > does a
> > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > If this is essentially a mega import of a library (300k..
> > > yikes)
> > > > ,
> > > > > > why
> > > > > > > > not
> > > > > > > > > make it a separate module instead of part of common?
> > > > > > > > >
> > > > > > > > > Jon.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > > ndimiduk@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > I'm of the opinion that HBase should provide a mechanism
> > for
> > > > > > > > serializing
> > > > > > > > > > common java types such that the serialized format sorts
> > > > according
> > > > > > the
> > > > > > > > > > the natural ordering of the type. I think many
> application
> > > > > efforts
> > > > > > > end
> > > > > > > > up
> > > > > > > > > > building a custom, partial implementation of this kind of
> > > > > > > functionality
> > > > > > > > > on
> > > > > > > > > > their own. I think HBase should provide a canonical
> > > > > implementation
> > > > > > of
> > > > > > > > > such
> > > > > > > > > > a serialization format so that third-parties can reliably
> > > build
> > > > > on
> > > > > > > top
> > > > > > > > of
> > > > > > > > > > HBase. Not just user applications, but other tools like
> Pig
> > > and
> > > > > > Hive
> > > > > > > > are
> > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > HIVE-3634<
> https://issues.apache.org/jira/browse/HIVE-3634
> > >,
> > > > > > > > > > HIVE-2599 <
> https://issues.apache.org/jira/browse/HIVE-2599
> > >,
> > > > or
> > > > > > > > > > HIVE-2903<
> https://issues.apache.org/jira/browse/HIVE-2903
> > > > >could
> > > > > be
> > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > >
> > > > > > > > > > After implementing something similar on multiple
> occasions,
> > > > > > stumbled
> > > > > > > > > across
> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> library.
> > > > It's
> > > > > > also
> > > > > > > > > > appears to have been adopted by other large projects,
> > > including
> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > I've engaged the library's author for some improvements
> > only
> > > to
> > > > > > find
> > > > > > > > out
> > > > > > > > > > he's now at Google and will no longer be maintaining it.
> > > Thus,
> > > > I
> > > > > > > > propose
> > > > > > > > > we
> > > > > > > > > > take it into HBase.
> > > > > > > > > >
> > > > > > > > > > HBASE-7692 <
> > https://issues.apache.org/jira/browse/HBASE-7692
> > > >
> > > > > > > > includes a
> > > > > > > > > > patch that introduces Orderly into hbase-common under the
> > > > orderly
> > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > gihub<
> > > > > > > > >
> > > > >
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > >wherein
> > > > > > > > > > I've broken the patch out into multiple commits to ease
> > > review.
> > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Nick
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > // jon@cloudera.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Jonathan Hsieh <jo...@cloudera.com>.
I think I misspoke slightly but basically agree with Matt's notion that
this would end up being the place to pickup the orderly jar and that
ideally it has no hbase-* dependencies.

I actually feel that the hbase-orderly module is a sibling to hbase-common
and hbase-client. My initial thought is that this is ideally not depended
upon by the hbase-client.  An app would use hbase-orderly and hbase-client.


 A simplified module dependency graph (excluding some details) would be
(where -> == "depends on")

app -> hbase-client, hbase-orderly
hbase-client -> hbase-protocol, hbase-common, *-compat
hbase-common -> none of the hbase-*
hbase-orderly -> none of the hbase-*

I'm don't quite understand what the multiple patches are for the module
work (or is this follow on stuff that uses this)?  can you explain what the
breakdown would be?  since it isn't committed yet and should be self
contained, just do the big import as a single patch?

Thanks for bring this up for discussion Nick.

Jon.

On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com> wrote:
>
> > To nitpick a little it wouldn't quite be a sibling of hbase-client
> because
> > hbase-client depends on hbase-common and hbase-protocol
> >
>
> Actually, quite the contrary. I don't see this as being an external module
> as much as integral to the client's use of HBase (read "client" as
> "application consuming HBase", not "the HBase RPC client implementation").
> Further, once HBase provides a suitable serialization format for
> primitives, why not push them into the client API? IMHO, HBase really
> should provide basic types for users at the Mutation layer. That, however,
> belongs in an entirely separate ticket.
>
> On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:
> >
> > > Yep the client will be fully separated as soon as rpc changes
> > > are stabilized.  Until then keeping up the move patch was just too
> > onerous.
> > >
> > >
> > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > > > Nick,
> > > >
> > > > I'm +1 for it having its own module, and being a sibling of
> > hbase-client.
> > > >  I'm assuming the client stuff will happen before we release 0.96
> since
> > > it
> > > > has been started.
> > > >
> > > > Jon.
> > > >
> > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > > wrote:
> > > >
> > > > > You're absolutely correct: this library introduces client-side
> > > > conventions
> > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > the consensus that it should reside in it's own module or be a
> > sibling
> > > to
> > > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > > current
> > > > > state of the modules; hbase-client looks empty while
> > o.a.h.hbase.client
> > > > > sits under hbase-server.
> > > > >
> > > > > Thanks,
> > > > > Nick
> > > > >
> > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jon@cloudera.com
> >
> > > > wrote:
> > > > >
> > > > > > So I buy the argument about this being included in hbase, but
> > several
> > > > of
> > > > > > the questions still stand --
> > > > > >
> > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > dependency
> > > > of
> > > > > > hbase-client module?  Does the hbase-server side need to depend
> on
> > > > this?
> > > > > >
> > > > > > Since this is a large import of a currently isolated library, why
> > not
> > > > > make
> > > > > > it a separate module instead of part of hbase-common?  This would
> > > > > enforce a
> > > > > > boundary that will prevent pollution from circular dependencies.
> > > > > >
> > > > > > Jon.
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > I think this belongs in core HBase, as a replacement to Bytes,
> > > which
> > > > > > should
> > > > > > > be deprecated eventually. We have a Bytes utility which is
> > supposed
> > > > to
> > > > > > > convert basic java types to byte[]'s, but it does not work for
> > > signed
> > > > > > > numbers.
> > > > > > >
> > > > > > > We already know that all of the clients, Hive, Pig, Phoenix,
> have
> > > to
> > > > > have
> > > > > > > at least java type -> byte[] conversion utilities, and I think
> it
> > > is
> > > > > > > HBase's job to supply one so that different clients can
> > > interoperate.
> > > > > > Since
> > > > > > > internally we are also relying on serializing java types, we
> need
> > > > that
> > > > > > > library in the core.
> > > > > > >
> > > > > > > BTW, I also think that we need to have a SQL-type to java type
> to
> > > > > byte[]
> > > > > > > layer, but that is another discussion.
> > > > > > >
> > > > > > > Enis
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > jon@cloudera.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Nick,
> > > > > > > >
> > > > > > > > While I believe having an order-preserving canonical
> > > serialization
> > > > > is a
> > > > > > > > good idea,  from doing a read of the mail and a skim of the
> > jira
> > > it
> > > > > is
> > > > > > > not
> > > > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > > > >
> > > > > > > > Why isn't this part of a library on top of hbase (a
> dependency
> > > for
> > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > Can't this functionality be done just from the client level?
> > > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > > Bytes.toBytes(*)
> > > > > > > > methods to enforced the ordering?
> > > > > > > > If I HBase has two mutually incompatible encodings
> "built-in",
> > > how
> > > > > > does a
> > > > > > > > dev know to use one or the other later on?
> > > > > > > > If this is essentially a mega import of a library (300k..
> > yikes)
> > > ,
> > > > > why
> > > > > > > not
> > > > > > > > make it a separate module instead of part of common?
> > > > > > > >
> > > > > > > > Jon.
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > ndimiduk@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'm of the opinion that HBase should provide a mechanism
> for
> > > > > > > serializing
> > > > > > > > > common java types such that the serialized format sorts
> > > according
> > > > > the
> > > > > > > > > the natural ordering of the type. I think many application
> > > > efforts
> > > > > > end
> > > > > > > up
> > > > > > > > > building a custom, partial implementation of this kind of
> > > > > > functionality
> > > > > > > > on
> > > > > > > > > their own. I think HBase should provide a canonical
> > > > implementation
> > > > > of
> > > > > > > > such
> > > > > > > > > a serialization format so that third-parties can reliably
> > build
> > > > on
> > > > > > top
> > > > > > > of
> > > > > > > > > HBase. Not just user applications, but other tools like Pig
> > and
> > > > > Hive
> > > > > > > are
> > > > > > > > > also enabled. Implementations for
> > > > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634
> >,
> > > > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599
> >,
> > > or
> > > > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> > > >could
> > > > be
> > > > > > > > > compatible with similar features in Pig.
> > > > > > > > >
> > > > > > > > > After implementing something similar on multiple occasions,
> > > > > stumbled
> > > > > > > > across
> > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> > > It's
> > > > > also
> > > > > > > > > appears to have been adopted by other large projects,
> > including
> > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > I've engaged the library's author for some improvements
> only
> > to
> > > > > find
> > > > > > > out
> > > > > > > > > he's now at Google and will no longer be maintaining it.
> > Thus,
> > > I
> > > > > > > propose
> > > > > > > > we
> > > > > > > > > take it into HBase.
> > > > > > > > >
> > > > > > > > > HBASE-7692 <
> https://issues.apache.org/jira/browse/HBASE-7692
> > >
> > > > > > > includes a
> > > > > > > > > patch that introduces Orderly into hbase-common under the
> > > orderly
> > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > gihub<
> > > > > > > >
> > > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > >wherein
> > > > > > > > > I've broken the patch out into multiple commits to ease
> > review.
> > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Nick
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > // Software Engineer, Cloudera
> > > > > > > > // jon@cloudera.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > // Jonathan Hsieh (shay)
> > > > > > // Software Engineer, Cloudera
> > > > > > // jon@cloudera.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // jon@cloudera.com
> > > >
> > >
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com> wrote:

> To nitpick a little it wouldn't quite be a sibling of hbase-client because
> hbase-client depends on hbase-common and hbase-protocol
>

Actually, quite the contrary. I don't see this as being an external module
as much as integral to the client's use of HBase (read "client" as
"application consuming HBase", not "the HBase RPC client implementation").
Further, once HBase provides a suitable serialization format for
primitives, why not push them into the client API? IMHO, HBase really
should provide basic types for users at the Mutation layer. That, however,
belongs in an entirely separate ticket.

On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:
>
> > Yep the client will be fully separated as soon as rpc changes
> > are stabilized.  Until then keeping up the move patch was just too
> onerous.
> >
> >
> > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > Nick,
> > >
> > > I'm +1 for it having its own module, and being a sibling of
> hbase-client.
> > >  I'm assuming the client stuff will happen before we release 0.96 since
> > it
> > > has been started.
> > >
> > > Jon.
> > >
> > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > You're absolutely correct: this library introduces client-side
> > > conventions
> > > > and is not needed from within the HMaster or RegionServer. Is
> > > > the consensus that it should reside in it's own module or be a
> sibling
> > to
> > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > current
> > > > state of the modules; hbase-client looks empty while
> o.a.h.hbase.client
> > > > sits under hbase-server.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > So I buy the argument about this being included in hbase, but
> several
> > > of
> > > > > the questions still stand --
> > > > >
> > > > > Why is this part of hbase-common?  shouldn't this be just a
> > dependency
> > > of
> > > > > hbase-client module?  Does the hbase-server side need to depend on
> > > this?
> > > > >
> > > > > Since this is a large import of a currently isolated library, why
> not
> > > > make
> > > > > it a separate module instead of part of hbase-common?  This would
> > > > enforce a
> > > > > boundary that will prevent pollution from circular dependencies.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > > wrote:
> > > > >
> > > > > > I think this belongs in core HBase, as a replacement to Bytes,
> > which
> > > > > should
> > > > > > be deprecated eventually. We have a Bytes utility which is
> supposed
> > > to
> > > > > > convert basic java types to byte[]'s, but it does not work for
> > signed
> > > > > > numbers.
> > > > > >
> > > > > > We already know that all of the clients, Hive, Pig, Phoenix, have
> > to
> > > > have
> > > > > > at least java type -> byte[] conversion utilities, and I think it
> > is
> > > > > > HBase's job to supply one so that different clients can
> > interoperate.
> > > > > Since
> > > > > > internally we are also relying on serializing java types, we need
> > > that
> > > > > > library in the core.
> > > > > >
> > > > > > BTW, I also think that we need to have a SQL-type to java type to
> > > > byte[]
> > > > > > layer, but that is another discussion.
> > > > > >
> > > > > > Enis
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> jon@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Nick,
> > > > > > >
> > > > > > > While I believe having an order-preserving canonical
> > serialization
> > > > is a
> > > > > > > good idea,  from doing a read of the mail and a skim of the
> jira
> > it
> > > > is
> > > > > > not
> > > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > > >
> > > > > > > Why isn't this part of a library on top of hbase (a dependency
> > for
> > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > Can't this functionality be done just from the client level?
> > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > Bytes.toBytes(*)
> > > > > > > methods to enforced the ordering?
> > > > > > > If I HBase has two mutually incompatible encodings "built-in",
> > how
> > > > > does a
> > > > > > > dev know to use one or the other later on?
> > > > > > > If this is essentially a mega import of a library (300k..
> yikes)
> > ,
> > > > why
> > > > > > not
> > > > > > > make it a separate module instead of part of common?
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > ndimiduk@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I'm of the opinion that HBase should provide a mechanism for
> > > > > > serializing
> > > > > > > > common java types such that the serialized format sorts
> > according
> > > > the
> > > > > > > > the natural ordering of the type. I think many application
> > > efforts
> > > > > end
> > > > > > up
> > > > > > > > building a custom, partial implementation of this kind of
> > > > > functionality
> > > > > > > on
> > > > > > > > their own. I think HBase should provide a canonical
> > > implementation
> > > > of
> > > > > > > such
> > > > > > > > a serialization format so that third-parties can reliably
> build
> > > on
> > > > > top
> > > > > > of
> > > > > > > > HBase. Not just user applications, but other tools like Pig
> and
> > > > Hive
> > > > > > are
> > > > > > > > also enabled. Implementations for
> > > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>,
> > or
> > > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> > >could
> > > be
> > > > > > > > compatible with similar features in Pig.
> > > > > > > >
> > > > > > > > After implementing something similar on multiple occasions,
> > > > stumbled
> > > > > > > across
> > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> > It's
> > > > also
> > > > > > > > appears to have been adopted by other large projects,
> including
> > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > I've engaged the library's author for some improvements only
> to
> > > > find
> > > > > > out
> > > > > > > > he's now at Google and will no longer be maintaining it.
> Thus,
> > I
> > > > > > propose
> > > > > > > we
> > > > > > > > take it into HBase.
> > > > > > > >
> > > > > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692
> >
> > > > > > includes a
> > > > > > > > patch that introduces Orderly into hbase-common under the
> > orderly
> > > > > > > > namespace. I have an associated branch on
> > > > > > > > gihub<
> > > > > > >
> > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > >wherein
> > > > > > > > I've broken the patch out into multiple commits to ease
> review.
> > > > > > > > Please take a few minutes to give it a look.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Nick
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Jesse Yates <je...@gmail.com>.
+1 on all Matt's comments
-------------------
Jesse Yates
@jesse_yates
jyates.github.com


On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com> wrote:

> To nitpick a little it wouldn't quite be a sibling of hbase-client because
> hbase-client depends on hbase-common and hbase-protocol while this new one
> will not depend on anything.  Would hbase-server be able to see it?  Would
> it basically be a standalone module being maintained by HBase?
>
> Also, assuming the original Orderly library goes unmaintained and we want
> people to use it, this will be the primary place to get it.  Having no
> dependencies on other hbase modules is important for people who want to use
> the Orderly library for something unrelated to hbase.  For example, a web
> application that logs data in this format but not directly to hbase.
>
>
> On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:
>
> > Yep the client will be fully separated as soon as rpc changes
> > are stabilized.  Until then keeping up the move patch was just too
> onerous.
> >
> >
> > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > Nick,
> > >
> > > I'm +1 for it having its own module, and being a sibling of
> hbase-client.
> > >  I'm assuming the client stuff will happen before we release 0.96 since
> > it
> > > has been started.
> > >
> > > Jon.
> > >
> > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > You're absolutely correct: this library introduces client-side
> > > conventions
> > > > and is not needed from within the HMaster or RegionServer. Is
> > > > the consensus that it should reside in it's own module or be a
> sibling
> > to
> > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > current
> > > > state of the modules; hbase-client looks empty while
> o.a.h.hbase.client
> > > > sits under hbase-server.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > So I buy the argument about this being included in hbase, but
> several
> > > of
> > > > > the questions still stand --
> > > > >
> > > > > Why is this part of hbase-common?  shouldn't this be just a
> > dependency
> > > of
> > > > > hbase-client module?  Does the hbase-server side need to depend on
> > > this?
> > > > >
> > > > > Since this is a large import of a currently isolated library, why
> not
> > > > make
> > > > > it a separate module instead of part of hbase-common?  This would
> > > > enforce a
> > > > > boundary that will prevent pollution from circular dependencies.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > > wrote:
> > > > >
> > > > > > I think this belongs in core HBase, as a replacement to Bytes,
> > which
> > > > > should
> > > > > > be deprecated eventually. We have a Bytes utility which is
> supposed
> > > to
> > > > > > convert basic java types to byte[]'s, but it does not work for
> > signed
> > > > > > numbers.
> > > > > >
> > > > > > We already know that all of the clients, Hive, Pig, Phoenix, have
> > to
> > > > have
> > > > > > at least java type -> byte[] conversion utilities, and I think it
> > is
> > > > > > HBase's job to supply one so that different clients can
> > interoperate.
> > > > > Since
> > > > > > internally we are also relying on serializing java types, we need
> > > that
> > > > > > library in the core.
> > > > > >
> > > > > > BTW, I also think that we need to have a SQL-type to java type to
> > > > byte[]
> > > > > > layer, but that is another discussion.
> > > > > >
> > > > > > Enis
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> jon@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Nick,
> > > > > > >
> > > > > > > While I believe having an order-preserving canonical
> > serialization
> > > > is a
> > > > > > > good idea,  from doing a read of the mail and a skim of the
> jira
> > it
> > > > is
> > > > > > not
> > > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > > >
> > > > > > > Why isn't this part of a library on top of hbase (a dependency
> > for
> > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > Can't this functionality be done just from the client level?
> > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > Bytes.toBytes(*)
> > > > > > > methods to enforced the ordering?
> > > > > > > If I HBase has two mutually incompatible encodings "built-in",
> > how
> > > > > does a
> > > > > > > dev know to use one or the other later on?
> > > > > > > If this is essentially a mega import of a library (300k..
> yikes)
> > ,
> > > > why
> > > > > > not
> > > > > > > make it a separate module instead of part of common?
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > ndimiduk@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I'm of the opinion that HBase should provide a mechanism for
> > > > > > serializing
> > > > > > > > common java types such that the serialized format sorts
> > according
> > > > the
> > > > > > > > the natural ordering of the type. I think many application
> > > efforts
> > > > > end
> > > > > > up
> > > > > > > > building a custom, partial implementation of this kind of
> > > > > functionality
> > > > > > > on
> > > > > > > > their own. I think HBase should provide a canonical
> > > implementation
> > > > of
> > > > > > > such
> > > > > > > > a serialization format so that third-parties can reliably
> build
> > > on
> > > > > top
> > > > > > of
> > > > > > > > HBase. Not just user applications, but other tools like Pig
> and
> > > > Hive
> > > > > > are
> > > > > > > > also enabled. Implementations for
> > > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>,
> > or
> > > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> > >could
> > > be
> > > > > > > > compatible with similar features in Pig.
> > > > > > > >
> > > > > > > > After implementing something similar on multiple occasions,
> > > > stumbled
> > > > > > > across
> > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> > It's
> > > > also
> > > > > > > > appears to have been adopted by other large projects,
> including
> > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > I've engaged the library's author for some improvements only
> to
> > > > find
> > > > > > out
> > > > > > > > he's now at Google and will no longer be maintaining it.
> Thus,
> > I
> > > > > > propose
> > > > > > > we
> > > > > > > > take it into HBase.
> > > > > > > >
> > > > > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692
> >
> > > > > > includes a
> > > > > > > > patch that introduces Orderly into hbase-common under the
> > orderly
> > > > > > > > namespace. I have an associated branch on
> > > > > > > > gihub<
> > > > > > >
> > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > >wherein
> > > > > > > > I've broken the patch out into multiple commits to ease
> review.
> > > > > > > > Please take a few minutes to give it a look.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Nick
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Stack <st...@duboce.net>.
On Fri, Feb 22, 2013 at 10:48 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mc...@hotpads.com> wrote:
>
> > >
> > > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> > > hbase-common.
> >
> > Oh, interesting.  Could we inline the code from Bytes.java and somehow
> get
> > rid of the ImmutableBytesWritable.  Like calling packages can add
> > ImmutableBytesWritable functionality on top if they want to?
>
>
> I'll need to do a more thorough evaluation, but a cursory glance indicates
> use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is used
> mostly as a convenient wrapper over byte[], and may well
> be replaceable with Hadoop's BytesWritable.
>
>
Bytes is a bit messy.  Has bits of ByteBuffer and Unsafe going on inside it.

The IBW exists because frequently burned by the fact that BW could change
under you when least expected it.



> Seems like something as low level as rearranging bytes should be dependency
> > free.
> >
>
> The implementation makes heavy use of Hadoop Writables, but the
> dependencies on HBase instances are mostly convenience.
>
>
Orderly makes use of Writables?

St.Ack

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Ted Yu <yu...@gmail.com>.
Thanks Nick for carrying this through.

My pledge to reviewers: if you disagree with putting orderly in its own
module, please express your idea now.

On Fri, Feb 22, 2013 at 11:37 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> I'm working through the code that will produce a patch placing orderly in
> its own module. A question to reviewers: would you prefer I create separate
> JIRA/tasks for each of the individual patches? Will that be easier to
> review than dumping my squashed patch onto this ticket and asking you to
> look at github? Having this broken out into multiple tickets, I would feel
> better about using review board to aggregate comments.
>
> Please advise.
> Nick
>
> On Fri, Feb 22, 2013 at 10:48 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mc...@hotpads.com>
> wrote:
> >
> >> >
> >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> >> > hbase-common.
> >>
> >> Oh, interesting.  Could we inline the code from Bytes.java and somehow
> get
> >> rid of the ImmutableBytesWritable.  Like calling packages can add
> >> ImmutableBytesWritable functionality on top if they want to?
> >
> >
> > I'll need to do a more thorough evaluation, but a cursory glance
> indicates
> > use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is
> used
> > mostly as a convenient wrapper over byte[], and may well
> > be replaceable with Hadoop's BytesWritable.
> >
> > Seems like something as low level as rearranging bytes should be
> >> dependency free.
> >>
> >
> > The implementation makes heavy use of Hadoop Writables, but the
> > dependencies on HBase instances are mostly convenience.
> >
> >  On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <nd...@gmail.com>
> >> wrote:
> >>
> >> > Inline.
> >> >
> >> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
> >> wrote:
> >> >
> >> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> >> > because
> >> > > hbase-client depends on hbase-common and hbase-protocol while this
> new
> >> > one
> >> > > will not depend on anything.  Would hbase-server be able to see it?
> >> >  Would
> >> > > it basically be a standalone module being maintained by HBase?
> >> > >
> >> >
> >> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> >> > hbase-common.
> >> >
> >> > Also, assuming the original Orderly library goes unmaintained and we
> >> want
> >> > > people to use it, this will be the primary place to get it.  Having
> no
> >> > > dependencies on other hbase modules is important for people who want
> >> to
> >> > use
> >> > > the Orderly library for something unrelated to hbase.  For example,
> a
> >> web
> >> > > application that logs data in this format but not directly to hbase.
> >> > >
> >> >
> >> > Orderly has gone unmaintained. The only fork with any activity that
> I'm
> >> > aware of is my own. I'd much rather see it gain the publicity,
> >> > additional scrutiny, wider adoption than continue as a pet-project.
> >> >
> >> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
> >> wrote:
> >> > >
> >> > > > Yep the client will be fully separated as soon as rpc changes
> >> > > > are stabilized.  Until then keeping up the move patch was just too
> >> > > onerous.
> >> > > >
> >> > > >
> >> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jon@cloudera.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Nick,
> >> > > > >
> >> > > > > I'm +1 for it having its own module, and being a sibling of
> >> > > hbase-client.
> >> > > > >  I'm assuming the client stuff will happen before we release
> 0.96
> >> > since
> >> > > > it
> >> > > > > has been started.
> >> > > > >
> >> > > > > Jon.
> >> > > > >
> >> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <
> ndimiduk@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > You're absolutely correct: this library introduces client-side
> >> > > > > conventions
> >> > > > > > and is not needed from within the HMaster or RegionServer. Is
> >> > > > > > the consensus that it should reside in it's own module or be a
> >> > > sibling
> >> > > > to
> >> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by
> the
> >> > > > current
> >> > > > > > state of the modules; hbase-client looks empty while
> >> > > o.a.h.hbase.client
> >> > > > > > sits under hbase-server.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Nick
> >> > > > > >
> >> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> >> jon@cloudera.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > So I buy the argument about this being included in hbase,
> but
> >> > > several
> >> > > > > of
> >> > > > > > > the questions still stand --
> >> > > > > > >
> >> > > > > > > Why is this part of hbase-common?  shouldn't this be just a
> >> > > > dependency
> >> > > > > of
> >> > > > > > > hbase-client module?  Does the hbase-server side need to
> >> depend
> >> > on
> >> > > > > this?
> >> > > > > > >
> >> > > > > > > Since this is a large import of a currently isolated
> library,
> >> why
> >> > > not
> >> > > > > > make
> >> > > > > > > it a separate module instead of part of hbase-common?  This
> >> would
> >> > > > > > enforce a
> >> > > > > > > boundary that will prevent pollution from circular
> >> dependencies.
> >> > > > > > >
> >> > > > > > > Jon.
> >> > > > > > >
> >> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> >> enis@apache.org>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I think this belongs in core HBase, as a replacement to
> >> Bytes,
> >> > > > which
> >> > > > > > > should
> >> > > > > > > > be deprecated eventually. We have a Bytes utility which is
> >> > > supposed
> >> > > > > to
> >> > > > > > > > convert basic java types to byte[]'s, but it does not work
> >> for
> >> > > > signed
> >> > > > > > > > numbers.
> >> > > > > > > >
> >> > > > > > > > We already know that all of the clients, Hive, Pig,
> Phoenix,
> >> > have
> >> > > > to
> >> > > > > > have
> >> > > > > > > > at least java type -> byte[] conversion utilities, and I
> >> think
> >> > it
> >> > > > is
> >> > > > > > > > HBase's job to supply one so that different clients can
> >> > > > interoperate.
> >> > > > > > > Since
> >> > > > > > > > internally we are also relying on serializing java types,
> we
> >> > need
> >> > > > > that
> >> > > > > > > > library in the core.
> >> > > > > > > >
> >> > > > > > > > BTW, I also think that we need to have a SQL-type to java
> >> type
> >> > to
> >> > > > > > byte[]
> >> > > > > > > > layer, but that is another discussion.
> >> > > > > > > >
> >> > > > > > > > Enis
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> >> > > jon@cloudera.com>
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Nick,
> >> > > > > > > > >
> >> > > > > > > > > While I believe having an order-preserving canonical
> >> > > > serialization
> >> > > > > > is a
> >> > > > > > > > > good idea,  from doing a read of the mail and a skim of
> >> the
> >> > > jira
> >> > > > it
> >> > > > > > is
> >> > > > > > > > not
> >> > > > > > > > > clear to my why this is inside hbase as part of
> >> hbase-common.
> >> > > > > > > > >
> >> > > > > > > > > Why isn't this part of a library on top of hbase (a
> >> > dependency
> >> > > > for
> >> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> >> > > > > > > > > Can't this functionality be done just from the client
> >> level?
> >> > > > > > > > > What's the end goal hee? Is the goal here to replace the
> >> > > > > > > Bytes.toBytes(*)
> >> > > > > > > > > methods to enforced the ordering?
> >> > > > > > > > > If I HBase has two mutually incompatible encodings
> >> > "built-in",
> >> > > > how
> >> > > > > > > does a
> >> > > > > > > > > dev know to use one or the other later on?
> >> > > > > > > > > If this is essentially a mega import of a library
> (300k..
> >> > > yikes)
> >> > > > ,
> >> > > > > > why
> >> > > > > > > > not
> >> > > > > > > > > make it a separate module instead of part of common?
> >> > > > > > > > >
> >> > > > > > > > > Jon.
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> >> > > > ndimiduk@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi everyone,
> >> > > > > > > > > >
> >> > > > > > > > > > I'm of the opinion that HBase should provide a
> mechanism
> >> > for
> >> > > > > > > > serializing
> >> > > > > > > > > > common java types such that the serialized format
> sorts
> >> > > > according
> >> > > > > > the
> >> > > > > > > > > > the natural ordering of the type. I think many
> >> application
> >> > > > > efforts
> >> > > > > > > end
> >> > > > > > > > up
> >> > > > > > > > > > building a custom, partial implementation of this kind
> >> of
> >> > > > > > > functionality
> >> > > > > > > > > on
> >> > > > > > > > > > their own. I think HBase should provide a canonical
> >> > > > > implementation
> >> > > > > > of
> >> > > > > > > > > such
> >> > > > > > > > > > a serialization format so that third-parties can
> >> reliably
> >> > > build
> >> > > > > on
> >> > > > > > > top
> >> > > > > > > > of
> >> > > > > > > > > > HBase. Not just user applications, but other tools
> like
> >> Pig
> >> > > and
> >> > > > > > Hive
> >> > > > > > > > are
> >> > > > > > > > > > also enabled. Implementations for
> >> > > > > > > > > > HIVE-3634<
> >> https://issues.apache.org/jira/browse/HIVE-3634
> >> > >,
> >> > > > > > > > > > HIVE-2599 <
> >> https://issues.apache.org/jira/browse/HIVE-2599
> >> > >,
> >> > > > or
> >> > > > > > > > > > HIVE-2903<
> >> https://issues.apache.org/jira/browse/HIVE-2903
> >> > > > >could
> >> > > > > be
> >> > > > > > > > > > compatible with similar features in Pig.
> >> > > > > > > > > >
> >> > > > > > > > > > After implementing something similar on multiple
> >> occasions,
> >> > > > > > stumbled
> >> > > > > > > > > across
> >> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> >> library.
> >> > > > It's
> >> > > > > > also
> >> > > > > > > > > > appears to have been adopted by other large projects,
> >> > > including
> >> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> >> > > > > > > > > > I've engaged the library's author for some
> improvements
> >> > only
> >> > > to
> >> > > > > > find
> >> > > > > > > > out
> >> > > > > > > > > > he's now at Google and will no longer be maintaining
> it.
> >> > > Thus,
> >> > > > I
> >> > > > > > > > propose
> >> > > > > > > > > we
> >> > > > > > > > > > take it into HBase.
> >> > > > > > > > > >
> >> > > > > > > > > > HBASE-7692 <
> >> > https://issues.apache.org/jira/browse/HBASE-7692
> >> > > >
> >> > > > > > > > includes a
> >> > > > > > > > > > patch that introduces Orderly into hbase-common under
> >> the
> >> > > > orderly
> >> > > > > > > > > > namespace. I have an associated branch on
> >> > > > > > > > > > gihub<
> >> > > > > > > > >
> >> > > > >
> >> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> >> > > > > > > > > > >wherein
> >> > > > > > > > > > I've broken the patch out into multiple commits to
> ease
> >> > > review.
> >> > > > > > > > > > Please take a few minutes to give it a look.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Nick
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > // Jonathan Hsieh (shay)
> >> > > > > > > > > // Software Engineer, Cloudera
> >> > > > > > > > > // jon@cloudera.com
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > // Jonathan Hsieh (shay)
> >> > > > > > > // Software Engineer, Cloudera
> >> > > > > > > // jon@cloudera.com
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > // Jonathan Hsieh (shay)
> >> > > > > // Software Engineer, Cloudera
> >> > > > > // jon@cloudera.com
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
I'm working through the code that will produce a patch placing orderly in
its own module. A question to reviewers: would you prefer I create separate
JIRA/tasks for each of the individual patches? Will that be easier to
review than dumping my squashed patch onto this ticket and asking you to
look at github? Having this broken out into multiple tickets, I would feel
better about using review board to aggregate comments.

Please advise.
Nick

On Fri, Feb 22, 2013 at 10:48 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mc...@hotpads.com> wrote:
>
>> >
>> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
>> > hbase-common.
>>
>> Oh, interesting.  Could we inline the code from Bytes.java and somehow get
>> rid of the ImmutableBytesWritable.  Like calling packages can add
>> ImmutableBytesWritable functionality on top if they want to?
>
>
> I'll need to do a more thorough evaluation, but a cursory glance indicates
> use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is used
> mostly as a convenient wrapper over byte[], and may well
> be replaceable with Hadoop's BytesWritable.
>
> Seems like something as low level as rearranging bytes should be
>> dependency free.
>>
>
> The implementation makes heavy use of Hadoop Writables, but the
> dependencies on HBase instances are mostly convenience.
>
>  On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <nd...@gmail.com>
>> wrote:
>>
>> > Inline.
>> >
>> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
>> wrote:
>> >
>> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
>> > because
>> > > hbase-client depends on hbase-common and hbase-protocol while this new
>> > one
>> > > will not depend on anything.  Would hbase-server be able to see it?
>> >  Would
>> > > it basically be a standalone module being maintained by HBase?
>> > >
>> >
>> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
>> > hbase-common.
>> >
>> > Also, assuming the original Orderly library goes unmaintained and we
>> want
>> > > people to use it, this will be the primary place to get it.  Having no
>> > > dependencies on other hbase modules is important for people who want
>> to
>> > use
>> > > the Orderly library for something unrelated to hbase.  For example, a
>> web
>> > > application that logs data in this format but not directly to hbase.
>> > >
>> >
>> > Orderly has gone unmaintained. The only fork with any activity that I'm
>> > aware of is my own. I'd much rather see it gain the publicity,
>> > additional scrutiny, wider adoption than continue as a pet-project.
>> >
>> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
>> wrote:
>> > >
>> > > > Yep the client will be fully separated as soon as rpc changes
>> > > > are stabilized.  Until then keeping up the move patch was just too
>> > > onerous.
>> > > >
>> > > >
>> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
>> > > wrote:
>> > > >
>> > > > > Nick,
>> > > > >
>> > > > > I'm +1 for it having its own module, and being a sibling of
>> > > hbase-client.
>> > > > >  I'm assuming the client stuff will happen before we release 0.96
>> > since
>> > > > it
>> > > > > has been started.
>> > > > >
>> > > > > Jon.
>> > > > >
>> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <ndimiduk@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > You're absolutely correct: this library introduces client-side
>> > > > > conventions
>> > > > > > and is not needed from within the HMaster or RegionServer. Is
>> > > > > > the consensus that it should reside in it's own module or be a
>> > > sibling
>> > > > to
>> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by the
>> > > > current
>> > > > > > state of the modules; hbase-client looks empty while
>> > > o.a.h.hbase.client
>> > > > > > sits under hbase-server.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Nick
>> > > > > >
>> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
>> jon@cloudera.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > So I buy the argument about this being included in hbase, but
>> > > several
>> > > > > of
>> > > > > > > the questions still stand --
>> > > > > > >
>> > > > > > > Why is this part of hbase-common?  shouldn't this be just a
>> > > > dependency
>> > > > > of
>> > > > > > > hbase-client module?  Does the hbase-server side need to
>> depend
>> > on
>> > > > > this?
>> > > > > > >
>> > > > > > > Since this is a large import of a currently isolated library,
>> why
>> > > not
>> > > > > > make
>> > > > > > > it a separate module instead of part of hbase-common?  This
>> would
>> > > > > > enforce a
>> > > > > > > boundary that will prevent pollution from circular
>> dependencies.
>> > > > > > >
>> > > > > > > Jon.
>> > > > > > >
>> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
>> enis@apache.org>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > I think this belongs in core HBase, as a replacement to
>> Bytes,
>> > > > which
>> > > > > > > should
>> > > > > > > > be deprecated eventually. We have a Bytes utility which is
>> > > supposed
>> > > > > to
>> > > > > > > > convert basic java types to byte[]'s, but it does not work
>> for
>> > > > signed
>> > > > > > > > numbers.
>> > > > > > > >
>> > > > > > > > We already know that all of the clients, Hive, Pig, Phoenix,
>> > have
>> > > > to
>> > > > > > have
>> > > > > > > > at least java type -> byte[] conversion utilities, and I
>> think
>> > it
>> > > > is
>> > > > > > > > HBase's job to supply one so that different clients can
>> > > > interoperate.
>> > > > > > > Since
>> > > > > > > > internally we are also relying on serializing java types, we
>> > need
>> > > > > that
>> > > > > > > > library in the core.
>> > > > > > > >
>> > > > > > > > BTW, I also think that we need to have a SQL-type to java
>> type
>> > to
>> > > > > > byte[]
>> > > > > > > > layer, but that is another discussion.
>> > > > > > > >
>> > > > > > > > Enis
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
>> > > jon@cloudera.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Nick,
>> > > > > > > > >
>> > > > > > > > > While I believe having an order-preserving canonical
>> > > > serialization
>> > > > > > is a
>> > > > > > > > > good idea,  from doing a read of the mail and a skim of
>> the
>> > > jira
>> > > > it
>> > > > > > is
>> > > > > > > > not
>> > > > > > > > > clear to my why this is inside hbase as part of
>> hbase-common.
>> > > > > > > > >
>> > > > > > > > > Why isn't this part of a library on top of hbase (a
>> > dependency
>> > > > for
>> > > > > > > > > Pig/Hive) instead of "inside" hbase?
>> > > > > > > > > Can't this functionality be done just from the client
>> level?
>> > > > > > > > > What's the end goal hee? Is the goal here to replace the
>> > > > > > > Bytes.toBytes(*)
>> > > > > > > > > methods to enforced the ordering?
>> > > > > > > > > If I HBase has two mutually incompatible encodings
>> > "built-in",
>> > > > how
>> > > > > > > does a
>> > > > > > > > > dev know to use one or the other later on?
>> > > > > > > > > If this is essentially a mega import of a library (300k..
>> > > yikes)
>> > > > ,
>> > > > > > why
>> > > > > > > > not
>> > > > > > > > > make it a separate module instead of part of common?
>> > > > > > > > >
>> > > > > > > > > Jon.
>> > > > > > > > >
>> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
>> > > > ndimiduk@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi everyone,
>> > > > > > > > > >
>> > > > > > > > > > I'm of the opinion that HBase should provide a mechanism
>> > for
>> > > > > > > > serializing
>> > > > > > > > > > common java types such that the serialized format sorts
>> > > > according
>> > > > > > the
>> > > > > > > > > > the natural ordering of the type. I think many
>> application
>> > > > > efforts
>> > > > > > > end
>> > > > > > > > up
>> > > > > > > > > > building a custom, partial implementation of this kind
>> of
>> > > > > > > functionality
>> > > > > > > > > on
>> > > > > > > > > > their own. I think HBase should provide a canonical
>> > > > > implementation
>> > > > > > of
>> > > > > > > > > such
>> > > > > > > > > > a serialization format so that third-parties can
>> reliably
>> > > build
>> > > > > on
>> > > > > > > top
>> > > > > > > > of
>> > > > > > > > > > HBase. Not just user applications, but other tools like
>> Pig
>> > > and
>> > > > > > Hive
>> > > > > > > > are
>> > > > > > > > > > also enabled. Implementations for
>> > > > > > > > > > HIVE-3634<
>> https://issues.apache.org/jira/browse/HIVE-3634
>> > >,
>> > > > > > > > > > HIVE-2599 <
>> https://issues.apache.org/jira/browse/HIVE-2599
>> > >,
>> > > > or
>> > > > > > > > > > HIVE-2903<
>> https://issues.apache.org/jira/browse/HIVE-2903
>> > > > >could
>> > > > > be
>> > > > > > > > > > compatible with similar features in Pig.
>> > > > > > > > > >
>> > > > > > > > > > After implementing something similar on multiple
>> occasions,
>> > > > > > stumbled
>> > > > > > > > > across
>> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
>> library.
>> > > > It's
>> > > > > > also
>> > > > > > > > > > appears to have been adopted by other large projects,
>> > > including
>> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
>> > > > > > > > > > I've engaged the library's author for some improvements
>> > only
>> > > to
>> > > > > > find
>> > > > > > > > out
>> > > > > > > > > > he's now at Google and will no longer be maintaining it.
>> > > Thus,
>> > > > I
>> > > > > > > > propose
>> > > > > > > > > we
>> > > > > > > > > > take it into HBase.
>> > > > > > > > > >
>> > > > > > > > > > HBASE-7692 <
>> > https://issues.apache.org/jira/browse/HBASE-7692
>> > > >
>> > > > > > > > includes a
>> > > > > > > > > > patch that introduces Orderly into hbase-common under
>> the
>> > > > orderly
>> > > > > > > > > > namespace. I have an associated branch on
>> > > > > > > > > > gihub<
>> > > > > > > > >
>> > > > >
>> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
>> > > > > > > > > > >wherein
>> > > > > > > > > > I've broken the patch out into multiple commits to ease
>> > > review.
>> > > > > > > > > > Please take a few minutes to give it a look.
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Nick
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > // Jonathan Hsieh (shay)
>> > > > > > > > > // Software Engineer, Cloudera
>> > > > > > > > > // jon@cloudera.com
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > // Jonathan Hsieh (shay)
>> > > > > > > // Software Engineer, Cloudera
>> > > > > > > // jon@cloudera.com
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > // Jonathan Hsieh (shay)
>> > > > > // Software Engineer, Cloudera
>> > > > > // jon@cloudera.com
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
On Fri, Feb 22, 2013 at 10:14 AM, Matt Corgan <mc...@hotpads.com> wrote:

> >
> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> > hbase-common.
>
> Oh, interesting.  Could we inline the code from Bytes.java and somehow get
> rid of the ImmutableBytesWritable.  Like calling packages can add
> ImmutableBytesWritable functionality on top if they want to?


I'll need to do a more thorough evaluation, but a cursory glance indicates
use of Bytes could be replaced by arraycopy. ImmutableBytesWritable is used
mostly as a convenient wrapper over byte[], and may well
be replaceable with Hadoop's BytesWritable.

Seems like something as low level as rearranging bytes should be dependency
> free.
>

The implementation makes heavy use of Hadoop Writables, but the
dependencies on HBase instances are mostly convenience.

 On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Inline.
> >
> > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com>
> wrote:
> >
> > > To nitpick a little it wouldn't quite be a sibling of hbase-client
> > because
> > > hbase-client depends on hbase-common and hbase-protocol while this new
> > one
> > > will not depend on anything.  Would hbase-server be able to see it?
> >  Would
> > > it basically be a standalone module being maintained by HBase?
> > >
> >
> > Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> > hbase-common.
> >
> > Also, assuming the original Orderly library goes unmaintained and we want
> > > people to use it, this will be the primary place to get it.  Having no
> > > dependencies on other hbase modules is important for people who want to
> > use
> > > the Orderly library for something unrelated to hbase.  For example, a
> web
> > > application that logs data in this format but not directly to hbase.
> > >
> >
> > Orderly has gone unmaintained. The only fork with any activity that I'm
> > aware of is my own. I'd much rather see it gain the publicity,
> > additional scrutiny, wider adoption than continue as a pet-project.
> >
> > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org>
> wrote:
> > >
> > > > Yep the client will be fully separated as soon as rpc changes
> > > > are stabilized.  Until then keeping up the move patch was just too
> > > onerous.
> > > >
> > > >
> > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > Nick,
> > > > >
> > > > > I'm +1 for it having its own module, and being a sibling of
> > > hbase-client.
> > > > >  I'm assuming the client stuff will happen before we release 0.96
> > since
> > > > it
> > > > > has been started.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > You're absolutely correct: this library introduces client-side
> > > > > conventions
> > > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > > the consensus that it should reside in it's own module or be a
> > > sibling
> > > > to
> > > > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > > > current
> > > > > > state of the modules; hbase-client looks empty while
> > > o.a.h.hbase.client
> > > > > > sits under hbase-server.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <
> jon@cloudera.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > So I buy the argument about this being included in hbase, but
> > > several
> > > > > of
> > > > > > > the questions still stand --
> > > > > > >
> > > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > > dependency
> > > > > of
> > > > > > > hbase-client module?  Does the hbase-server side need to depend
> > on
> > > > > this?
> > > > > > >
> > > > > > > Since this is a large import of a currently isolated library,
> why
> > > not
> > > > > > make
> > > > > > > it a separate module instead of part of hbase-common?  This
> would
> > > > > > enforce a
> > > > > > > boundary that will prevent pollution from circular
> dependencies.
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <
> enis@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > I think this belongs in core HBase, as a replacement to
> Bytes,
> > > > which
> > > > > > > should
> > > > > > > > be deprecated eventually. We have a Bytes utility which is
> > > supposed
> > > > > to
> > > > > > > > convert basic java types to byte[]'s, but it does not work
> for
> > > > signed
> > > > > > > > numbers.
> > > > > > > >
> > > > > > > > We already know that all of the clients, Hive, Pig, Phoenix,
> > have
> > > > to
> > > > > > have
> > > > > > > > at least java type -> byte[] conversion utilities, and I
> think
> > it
> > > > is
> > > > > > > > HBase's job to supply one so that different clients can
> > > > interoperate.
> > > > > > > Since
> > > > > > > > internally we are also relying on serializing java types, we
> > need
> > > > > that
> > > > > > > > library in the core.
> > > > > > > >
> > > > > > > > BTW, I also think that we need to have a SQL-type to java
> type
> > to
> > > > > > byte[]
> > > > > > > > layer, but that is another discussion.
> > > > > > > >
> > > > > > > > Enis
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > > jon@cloudera.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Nick,
> > > > > > > > >
> > > > > > > > > While I believe having an order-preserving canonical
> > > > serialization
> > > > > > is a
> > > > > > > > > good idea,  from doing a read of the mail and a skim of the
> > > jira
> > > > it
> > > > > > is
> > > > > > > > not
> > > > > > > > > clear to my why this is inside hbase as part of
> hbase-common.
> > > > > > > > >
> > > > > > > > > Why isn't this part of a library on top of hbase (a
> > dependency
> > > > for
> > > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > > Can't this functionality be done just from the client
> level?
> > > > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > > > Bytes.toBytes(*)
> > > > > > > > > methods to enforced the ordering?
> > > > > > > > > If I HBase has two mutually incompatible encodings
> > "built-in",
> > > > how
> > > > > > > does a
> > > > > > > > > dev know to use one or the other later on?
> > > > > > > > > If this is essentially a mega import of a library (300k..
> > > yikes)
> > > > ,
> > > > > > why
> > > > > > > > not
> > > > > > > > > make it a separate module instead of part of common?
> > > > > > > > >
> > > > > > > > > Jon.
> > > > > > > > >
> > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > > ndimiduk@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > I'm of the opinion that HBase should provide a mechanism
> > for
> > > > > > > > serializing
> > > > > > > > > > common java types such that the serialized format sorts
> > > > according
> > > > > > the
> > > > > > > > > > the natural ordering of the type. I think many
> application
> > > > > efforts
> > > > > > > end
> > > > > > > > up
> > > > > > > > > > building a custom, partial implementation of this kind of
> > > > > > > functionality
> > > > > > > > > on
> > > > > > > > > > their own. I think HBase should provide a canonical
> > > > > implementation
> > > > > > of
> > > > > > > > > such
> > > > > > > > > > a serialization format so that third-parties can reliably
> > > build
> > > > > on
> > > > > > > top
> > > > > > > > of
> > > > > > > > > > HBase. Not just user applications, but other tools like
> Pig
> > > and
> > > > > > Hive
> > > > > > > > are
> > > > > > > > > > also enabled. Implementations for
> > > > > > > > > > HIVE-3634<
> https://issues.apache.org/jira/browse/HIVE-3634
> > >,
> > > > > > > > > > HIVE-2599 <
> https://issues.apache.org/jira/browse/HIVE-2599
> > >,
> > > > or
> > > > > > > > > > HIVE-2903<
> https://issues.apache.org/jira/browse/HIVE-2903
> > > > >could
> > > > > be
> > > > > > > > > > compatible with similar features in Pig.
> > > > > > > > > >
> > > > > > > > > > After implementing something similar on multiple
> occasions,
> > > > > > stumbled
> > > > > > > > > across
> > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly>
> library.
> > > > It's
> > > > > > also
> > > > > > > > > > appears to have been adopted by other large projects,
> > > including
> > > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > > I've engaged the library's author for some improvements
> > only
> > > to
> > > > > > find
> > > > > > > > out
> > > > > > > > > > he's now at Google and will no longer be maintaining it.
> > > Thus,
> > > > I
> > > > > > > > propose
> > > > > > > > > we
> > > > > > > > > > take it into HBase.
> > > > > > > > > >
> > > > > > > > > > HBASE-7692 <
> > https://issues.apache.org/jira/browse/HBASE-7692
> > > >
> > > > > > > > includes a
> > > > > > > > > > patch that introduces Orderly into hbase-common under the
> > > > orderly
> > > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > > gihub<
> > > > > > > > >
> > > > >
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > > >wherein
> > > > > > > > > > I've broken the patch out into multiple commits to ease
> > > review.
> > > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Nick
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > > // Software Engineer, Cloudera
> > > > > > > > > // jon@cloudera.com
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Matt Corgan <mc...@hotpads.com>.
>
> Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> hbase-common.

Oh, interesting.  Could we inline the code from Bytes.java and somehow get
rid of the ImmutableBytesWritable.  Like calling packages can add
ImmutableBytesWritable functionality on top if they want to?  Seems like
something as low level as rearranging bytes should be dependency free.


On Fri, Feb 22, 2013 at 10:04 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Inline.
>
> On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com> wrote:
>
> > To nitpick a little it wouldn't quite be a sibling of hbase-client
> because
> > hbase-client depends on hbase-common and hbase-protocol while this new
> one
> > will not depend on anything.  Would hbase-server be able to see it?
>  Would
> > it basically be a standalone module being maintained by HBase?
> >
>
> Not quite true. It makes use of Bytes and ImmutableBytesWritable from
> hbase-common.
>
> Also, assuming the original Orderly library goes unmaintained and we want
> > people to use it, this will be the primary place to get it.  Having no
> > dependencies on other hbase modules is important for people who want to
> use
> > the Orderly library for something unrelated to hbase.  For example, a web
> > application that logs data in this format but not directly to hbase.
> >
>
> Orderly has gone unmaintained. The only fork with any activity that I'm
> aware of is my own. I'd much rather see it gain the publicity,
> additional scrutiny, wider adoption than continue as a pet-project.
>
> On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:
> >
> > > Yep the client will be fully separated as soon as rpc changes
> > > are stabilized.  Until then keeping up the move patch was just too
> > onerous.
> > >
> > >
> > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > > > Nick,
> > > >
> > > > I'm +1 for it having its own module, and being a sibling of
> > hbase-client.
> > > >  I'm assuming the client stuff will happen before we release 0.96
> since
> > > it
> > > > has been started.
> > > >
> > > > Jon.
> > > >
> > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > > wrote:
> > > >
> > > > > You're absolutely correct: this library introduces client-side
> > > > conventions
> > > > > and is not needed from within the HMaster or RegionServer. Is
> > > > > the consensus that it should reside in it's own module or be a
> > sibling
> > > to
> > > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > > current
> > > > > state of the modules; hbase-client looks empty while
> > o.a.h.hbase.client
> > > > > sits under hbase-server.
> > > > >
> > > > > Thanks,
> > > > > Nick
> > > > >
> > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jon@cloudera.com
> >
> > > > wrote:
> > > > >
> > > > > > So I buy the argument about this being included in hbase, but
> > several
> > > > of
> > > > > > the questions still stand --
> > > > > >
> > > > > > Why is this part of hbase-common?  shouldn't this be just a
> > > dependency
> > > > of
> > > > > > hbase-client module?  Does the hbase-server side need to depend
> on
> > > > this?
> > > > > >
> > > > > > Since this is a large import of a currently isolated library, why
> > not
> > > > > make
> > > > > > it a separate module instead of part of hbase-common?  This would
> > > > > enforce a
> > > > > > boundary that will prevent pollution from circular dependencies.
> > > > > >
> > > > > > Jon.
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > I think this belongs in core HBase, as a replacement to Bytes,
> > > which
> > > > > > should
> > > > > > > be deprecated eventually. We have a Bytes utility which is
> > supposed
> > > > to
> > > > > > > convert basic java types to byte[]'s, but it does not work for
> > > signed
> > > > > > > numbers.
> > > > > > >
> > > > > > > We already know that all of the clients, Hive, Pig, Phoenix,
> have
> > > to
> > > > > have
> > > > > > > at least java type -> byte[] conversion utilities, and I think
> it
> > > is
> > > > > > > HBase's job to supply one so that different clients can
> > > interoperate.
> > > > > > Since
> > > > > > > internally we are also relying on serializing java types, we
> need
> > > > that
> > > > > > > library in the core.
> > > > > > >
> > > > > > > BTW, I also think that we need to have a SQL-type to java type
> to
> > > > > byte[]
> > > > > > > layer, but that is another discussion.
> > > > > > >
> > > > > > > Enis
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> > jon@cloudera.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Nick,
> > > > > > > >
> > > > > > > > While I believe having an order-preserving canonical
> > > serialization
> > > > > is a
> > > > > > > > good idea,  from doing a read of the mail and a skim of the
> > jira
> > > it
> > > > > is
> > > > > > > not
> > > > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > > > >
> > > > > > > > Why isn't this part of a library on top of hbase (a
> dependency
> > > for
> > > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > > Can't this functionality be done just from the client level?
> > > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > > Bytes.toBytes(*)
> > > > > > > > methods to enforced the ordering?
> > > > > > > > If I HBase has two mutually incompatible encodings
> "built-in",
> > > how
> > > > > > does a
> > > > > > > > dev know to use one or the other later on?
> > > > > > > > If this is essentially a mega import of a library (300k..
> > yikes)
> > > ,
> > > > > why
> > > > > > > not
> > > > > > > > make it a separate module instead of part of common?
> > > > > > > >
> > > > > > > > Jon.
> > > > > > > >
> > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > > ndimiduk@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'm of the opinion that HBase should provide a mechanism
> for
> > > > > > > serializing
> > > > > > > > > common java types such that the serialized format sorts
> > > according
> > > > > the
> > > > > > > > > the natural ordering of the type. I think many application
> > > > efforts
> > > > > > end
> > > > > > > up
> > > > > > > > > building a custom, partial implementation of this kind of
> > > > > > functionality
> > > > > > > > on
> > > > > > > > > their own. I think HBase should provide a canonical
> > > > implementation
> > > > > of
> > > > > > > > such
> > > > > > > > > a serialization format so that third-parties can reliably
> > build
> > > > on
> > > > > > top
> > > > > > > of
> > > > > > > > > HBase. Not just user applications, but other tools like Pig
> > and
> > > > > Hive
> > > > > > > are
> > > > > > > > > also enabled. Implementations for
> > > > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634
> >,
> > > > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599
> >,
> > > or
> > > > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> > > >could
> > > > be
> > > > > > > > > compatible with similar features in Pig.
> > > > > > > > >
> > > > > > > > > After implementing something similar on multiple occasions,
> > > > > stumbled
> > > > > > > > across
> > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> > > It's
> > > > > also
> > > > > > > > > appears to have been adopted by other large projects,
> > including
> > > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > > I've engaged the library's author for some improvements
> only
> > to
> > > > > find
> > > > > > > out
> > > > > > > > > he's now at Google and will no longer be maintaining it.
> > Thus,
> > > I
> > > > > > > propose
> > > > > > > > we
> > > > > > > > > take it into HBase.
> > > > > > > > >
> > > > > > > > > HBASE-7692 <
> https://issues.apache.org/jira/browse/HBASE-7692
> > >
> > > > > > > includes a
> > > > > > > > > patch that introduces Orderly into hbase-common under the
> > > orderly
> > > > > > > > > namespace. I have an associated branch on
> > > > > > > > > gihub<
> > > > > > > >
> > > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > > >wherein
> > > > > > > > > I've broken the patch out into multiple commits to ease
> > review.
> > > > > > > > > Please take a few minutes to give it a look.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Nick
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > // Jonathan Hsieh (shay)
> > > > > > > > // Software Engineer, Cloudera
> > > > > > > > // jon@cloudera.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > // Jonathan Hsieh (shay)
> > > > > > // Software Engineer, Cloudera
> > > > > > // jon@cloudera.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // jon@cloudera.com
> > > >
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
Inline.

On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <mc...@hotpads.com> wrote:

> To nitpick a little it wouldn't quite be a sibling of hbase-client because
> hbase-client depends on hbase-common and hbase-protocol while this new one
> will not depend on anything.  Would hbase-server be able to see it?  Would
> it basically be a standalone module being maintained by HBase?
>

Not quite true. It makes use of Bytes and ImmutableBytesWritable from
hbase-common.

Also, assuming the original Orderly library goes unmaintained and we want
> people to use it, this will be the primary place to get it.  Having no
> dependencies on other hbase modules is important for people who want to use
> the Orderly library for something unrelated to hbase.  For example, a web
> application that logs data in this format but not directly to hbase.
>

Orderly has gone unmaintained. The only fork with any activity that I'm
aware of is my own. I'd much rather see it gain the publicity,
additional scrutiny, wider adoption than continue as a pet-project.

On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:
>
> > Yep the client will be fully separated as soon as rpc changes
> > are stabilized.  Until then keeping up the move patch was just too
> onerous.
> >
> >
> > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > Nick,
> > >
> > > I'm +1 for it having its own module, and being a sibling of
> hbase-client.
> > >  I'm assuming the client stuff will happen before we release 0.96 since
> > it
> > > has been started.
> > >
> > > Jon.
> > >
> > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > You're absolutely correct: this library introduces client-side
> > > conventions
> > > > and is not needed from within the HMaster or RegionServer. Is
> > > > the consensus that it should reside in it's own module or be a
> sibling
> > to
> > > > the o.a.h.hbase.client source tree? I'm a little confused by the
> > current
> > > > state of the modules; hbase-client looks empty while
> o.a.h.hbase.client
> > > > sits under hbase-server.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > So I buy the argument about this being included in hbase, but
> several
> > > of
> > > > > the questions still stand --
> > > > >
> > > > > Why is this part of hbase-common?  shouldn't this be just a
> > dependency
> > > of
> > > > > hbase-client module?  Does the hbase-server side need to depend on
> > > this?
> > > > >
> > > > > Since this is a large import of a currently isolated library, why
> not
> > > > make
> > > > > it a separate module instead of part of hbase-common?  This would
> > > > enforce a
> > > > > boundary that will prevent pollution from circular dependencies.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > > wrote:
> > > > >
> > > > > > I think this belongs in core HBase, as a replacement to Bytes,
> > which
> > > > > should
> > > > > > be deprecated eventually. We have a Bytes utility which is
> supposed
> > > to
> > > > > > convert basic java types to byte[]'s, but it does not work for
> > signed
> > > > > > numbers.
> > > > > >
> > > > > > We already know that all of the clients, Hive, Pig, Phoenix, have
> > to
> > > > have
> > > > > > at least java type -> byte[] conversion utilities, and I think it
> > is
> > > > > > HBase's job to supply one so that different clients can
> > interoperate.
> > > > > Since
> > > > > > internally we are also relying on serializing java types, we need
> > > that
> > > > > > library in the core.
> > > > > >
> > > > > > BTW, I also think that we need to have a SQL-type to java type to
> > > > byte[]
> > > > > > layer, but that is another discussion.
> > > > > >
> > > > > > Enis
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <
> jon@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Nick,
> > > > > > >
> > > > > > > While I believe having an order-preserving canonical
> > serialization
> > > > is a
> > > > > > > good idea,  from doing a read of the mail and a skim of the
> jira
> > it
> > > > is
> > > > > > not
> > > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > > >
> > > > > > > Why isn't this part of a library on top of hbase (a dependency
> > for
> > > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > > Can't this functionality be done just from the client level?
> > > > > > > What's the end goal hee? Is the goal here to replace the
> > > > > Bytes.toBytes(*)
> > > > > > > methods to enforced the ordering?
> > > > > > > If I HBase has two mutually incompatible encodings "built-in",
> > how
> > > > > does a
> > > > > > > dev know to use one or the other later on?
> > > > > > > If this is essentially a mega import of a library (300k..
> yikes)
> > ,
> > > > why
> > > > > > not
> > > > > > > make it a separate module instead of part of common?
> > > > > > >
> > > > > > > Jon.
> > > > > > >
> > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> > ndimiduk@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I'm of the opinion that HBase should provide a mechanism for
> > > > > > serializing
> > > > > > > > common java types such that the serialized format sorts
> > according
> > > > the
> > > > > > > > the natural ordering of the type. I think many application
> > > efforts
> > > > > end
> > > > > > up
> > > > > > > > building a custom, partial implementation of this kind of
> > > > > functionality
> > > > > > > on
> > > > > > > > their own. I think HBase should provide a canonical
> > > implementation
> > > > of
> > > > > > > such
> > > > > > > > a serialization format so that third-parties can reliably
> build
> > > on
> > > > > top
> > > > > > of
> > > > > > > > HBase. Not just user applications, but other tools like Pig
> and
> > > > Hive
> > > > > > are
> > > > > > > > also enabled. Implementations for
> > > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>,
> > or
> > > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> > >could
> > > be
> > > > > > > > compatible with similar features in Pig.
> > > > > > > >
> > > > > > > > After implementing something similar on multiple occasions,
> > > > stumbled
> > > > > > > across
> > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> > It's
> > > > also
> > > > > > > > appears to have been adopted by other large projects,
> including
> > > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > > I've engaged the library's author for some improvements only
> to
> > > > find
> > > > > > out
> > > > > > > > he's now at Google and will no longer be maintaining it.
> Thus,
> > I
> > > > > > propose
> > > > > > > we
> > > > > > > > take it into HBase.
> > > > > > > >
> > > > > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692
> >
> > > > > > includes a
> > > > > > > > patch that introduces Orderly into hbase-common under the
> > orderly
> > > > > > > > namespace. I have an associated branch on
> > > > > > > > gihub<
> > > > > > >
> > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > > >wherein
> > > > > > > > I've broken the patch out into multiple commits to ease
> review.
> > > > > > > > Please take a few minutes to give it a look.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Nick
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > // Jonathan Hsieh (shay)
> > > > > > > // Software Engineer, Cloudera
> > > > > > > // jon@cloudera.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Matt Corgan <mc...@hotpads.com>.
To nitpick a little it wouldn't quite be a sibling of hbase-client because
hbase-client depends on hbase-common and hbase-protocol while this new one
will not depend on anything.  Would hbase-server be able to see it?  Would
it basically be a standalone module being maintained by HBase?

Also, assuming the original Orderly library goes unmaintained and we want
people to use it, this will be the primary place to get it.  Having no
dependencies on other hbase modules is important for people who want to use
the Orderly library for something unrelated to hbase.  For example, a web
application that logs data in this format but not directly to hbase.


On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <ec...@apache.org> wrote:

> Yep the client will be fully separated as soon as rpc changes
> are stabilized.  Until then keeping up the move patch was just too onerous.
>
>
> On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > Nick,
> >
> > I'm +1 for it having its own module, and being a sibling of hbase-client.
> >  I'm assuming the client stuff will happen before we release 0.96 since
> it
> > has been started.
> >
> > Jon.
> >
> > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >
> > > You're absolutely correct: this library introduces client-side
> > conventions
> > > and is not needed from within the HMaster or RegionServer. Is
> > > the consensus that it should reside in it's own module or be a sibling
> to
> > > the o.a.h.hbase.client source tree? I'm a little confused by the
> current
> > > state of the modules; hbase-client looks empty while o.a.h.hbase.client
> > > sits under hbase-server.
> > >
> > > Thanks,
> > > Nick
> > >
> > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > > > So I buy the argument about this being included in hbase, but several
> > of
> > > > the questions still stand --
> > > >
> > > > Why is this part of hbase-common?  shouldn't this be just a
> dependency
> > of
> > > > hbase-client module?  Does the hbase-server side need to depend on
> > this?
> > > >
> > > > Since this is a large import of a currently isolated library, why not
> > > make
> > > > it a separate module instead of part of hbase-common?  This would
> > > enforce a
> > > > boundary that will prevent pollution from circular dependencies.
> > > >
> > > > Jon.
> > > >
> > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> > wrote:
> > > >
> > > > > I think this belongs in core HBase, as a replacement to Bytes,
> which
> > > > should
> > > > > be deprecated eventually. We have a Bytes utility which is supposed
> > to
> > > > > convert basic java types to byte[]'s, but it does not work for
> signed
> > > > > numbers.
> > > > >
> > > > > We already know that all of the clients, Hive, Pig, Phoenix, have
> to
> > > have
> > > > > at least java type -> byte[] conversion utilities, and I think it
> is
> > > > > HBase's job to supply one so that different clients can
> interoperate.
> > > > Since
> > > > > internally we are also relying on serializing java types, we need
> > that
> > > > > library in the core.
> > > > >
> > > > > BTW, I also think that we need to have a SQL-type to java type to
> > > byte[]
> > > > > layer, but that is another discussion.
> > > > >
> > > > > Enis
> > > > >
> > > > >
> > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com>
> > > > wrote:
> > > > >
> > > > > > Nick,
> > > > > >
> > > > > > While I believe having an order-preserving canonical
> serialization
> > > is a
> > > > > > good idea,  from doing a read of the mail and a skim of the jira
> it
> > > is
> > > > > not
> > > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > > >
> > > > > > Why isn't this part of a library on top of hbase (a dependency
> for
> > > > > > Pig/Hive) instead of "inside" hbase?
> > > > > > Can't this functionality be done just from the client level?
> > > > > > What's the end goal hee? Is the goal here to replace the
> > > > Bytes.toBytes(*)
> > > > > > methods to enforced the ordering?
> > > > > > If I HBase has two mutually incompatible encodings "built-in",
> how
> > > > does a
> > > > > > dev know to use one or the other later on?
> > > > > > If this is essentially a mega import of a library (300k.. yikes)
> ,
> > > why
> > > > > not
> > > > > > make it a separate module instead of part of common?
> > > > > >
> > > > > > Jon.
> > > > > >
> > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <
> ndimiduk@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I'm of the opinion that HBase should provide a mechanism for
> > > > > serializing
> > > > > > > common java types such that the serialized format sorts
> according
> > > the
> > > > > > > the natural ordering of the type. I think many application
> > efforts
> > > > end
> > > > > up
> > > > > > > building a custom, partial implementation of this kind of
> > > > functionality
> > > > > > on
> > > > > > > their own. I think HBase should provide a canonical
> > implementation
> > > of
> > > > > > such
> > > > > > > a serialization format so that third-parties can reliably build
> > on
> > > > top
> > > > > of
> > > > > > > HBase. Not just user applications, but other tools like Pig and
> > > Hive
> > > > > are
> > > > > > > also enabled. Implementations for
> > > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>,
> or
> > > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903
> >could
> > be
> > > > > > > compatible with similar features in Pig.
> > > > > > >
> > > > > > > After implementing something similar on multiple occasions,
> > > stumbled
> > > > > > across
> > > > > > > the Orderly <https://github.com/ndimiduk/orderly> library.
> It's
> > > also
> > > > > > > appears to have been adopted by other large projects, including
> > > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > > I've engaged the library's author for some improvements only to
> > > find
> > > > > out
> > > > > > > he's now at Google and will no longer be maintaining it. Thus,
> I
> > > > > propose
> > > > > > we
> > > > > > > take it into HBase.
> > > > > > >
> > > > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> > > > > includes a
> > > > > > > patch that introduces Orderly into hbase-common under the
> orderly
> > > > > > > namespace. I have an associated branch on
> > > > > > > gihub<
> > > > > >
> > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > > >wherein
> > > > > > > I've broken the patch out into multiple commits to ease review.
> > > > > > > Please take a few minutes to give it a look.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Nick
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > // Jonathan Hsieh (shay)
> > > > > > // Software Engineer, Cloudera
> > > > > > // jon@cloudera.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // jon@cloudera.com
> > > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Elliott Clark <ec...@apache.org>.
Yep the client will be fully separated as soon as rpc changes
are stabilized.  Until then keeping up the move patch was just too onerous.


On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Nick,
>
> I'm +1 for it having its own module, and being a sibling of hbase-client.
>  I'm assuming the client stuff will happen before we release 0.96 since it
> has been started.
>
> Jon.
>
> On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > You're absolutely correct: this library introduces client-side
> conventions
> > and is not needed from within the HMaster or RegionServer. Is
> > the consensus that it should reside in it's own module or be a sibling to
> > the o.a.h.hbase.client source tree? I'm a little confused by the current
> > state of the modules; hbase-client looks empty while o.a.h.hbase.client
> > sits under hbase-server.
> >
> > Thanks,
> > Nick
> >
> > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > So I buy the argument about this being included in hbase, but several
> of
> > > the questions still stand --
> > >
> > > Why is this part of hbase-common?  shouldn't this be just a dependency
> of
> > > hbase-client module?  Does the hbase-server side need to depend on
> this?
> > >
> > > Since this is a large import of a currently isolated library, why not
> > make
> > > it a separate module instead of part of hbase-common?  This would
> > enforce a
> > > boundary that will prevent pollution from circular dependencies.
> > >
> > > Jon.
> > >
> > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org>
> wrote:
> > >
> > > > I think this belongs in core HBase, as a replacement to Bytes, which
> > > should
> > > > be deprecated eventually. We have a Bytes utility which is supposed
> to
> > > > convert basic java types to byte[]'s, but it does not work for signed
> > > > numbers.
> > > >
> > > > We already know that all of the clients, Hive, Pig, Phoenix, have to
> > have
> > > > at least java type -> byte[] conversion utilities, and I think it is
> > > > HBase's job to supply one so that different clients can interoperate.
> > > Since
> > > > internally we are also relying on serializing java types, we need
> that
> > > > library in the core.
> > > >
> > > > BTW, I also think that we need to have a SQL-type to java type to
> > byte[]
> > > > layer, but that is another discussion.
> > > >
> > > > Enis
> > > >
> > > >
> > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com>
> > > wrote:
> > > >
> > > > > Nick,
> > > > >
> > > > > While I believe having an order-preserving canonical serialization
> > is a
> > > > > good idea,  from doing a read of the mail and a skim of the jira it
> > is
> > > > not
> > > > > clear to my why this is inside hbase as part of hbase-common.
> > > > >
> > > > > Why isn't this part of a library on top of hbase (a dependency for
> > > > > Pig/Hive) instead of "inside" hbase?
> > > > > Can't this functionality be done just from the client level?
> > > > > What's the end goal hee? Is the goal here to replace the
> > > Bytes.toBytes(*)
> > > > > methods to enforced the ordering?
> > > > > If I HBase has two mutually incompatible encodings "built-in", how
> > > does a
> > > > > dev know to use one or the other later on?
> > > > > If this is essentially a mega import of a library (300k.. yikes) ,
> > why
> > > > not
> > > > > make it a separate module instead of part of common?
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <ndimiduk@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'm of the opinion that HBase should provide a mechanism for
> > > > serializing
> > > > > > common java types such that the serialized format sorts according
> > the
> > > > > > the natural ordering of the type. I think many application
> efforts
> > > end
> > > > up
> > > > > > building a custom, partial implementation of this kind of
> > > functionality
> > > > > on
> > > > > > their own. I think HBase should provide a canonical
> implementation
> > of
> > > > > such
> > > > > > a serialization format so that third-parties can reliably build
> on
> > > top
> > > > of
> > > > > > HBase. Not just user applications, but other tools like Pig and
> > Hive
> > > > are
> > > > > > also enabled. Implementations for
> > > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could
> be
> > > > > > compatible with similar features in Pig.
> > > > > >
> > > > > > After implementing something similar on multiple occasions,
> > stumbled
> > > > > across
> > > > > > the Orderly <https://github.com/ndimiduk/orderly> library. It's
> > also
> > > > > > appears to have been adopted by other large projects, including
> > > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > > I've engaged the library's author for some improvements only to
> > find
> > > > out
> > > > > > he's now at Google and will no longer be maintaining it. Thus, I
> > > > propose
> > > > > we
> > > > > > take it into HBase.
> > > > > >
> > > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> > > > includes a
> > > > > > patch that introduces Orderly into hbase-common under the orderly
> > > > > > namespace. I have an associated branch on
> > > > > > gihub<
> > > > >
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > > >wherein
> > > > > > I've broken the patch out into multiple commits to ease review.
> > > > > > Please take a few minutes to give it a look.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Nick,

I'm +1 for it having its own module, and being a sibling of hbase-client.
 I'm assuming the client stuff will happen before we release 0.96 since it
has been started.

Jon.

On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> You're absolutely correct: this library introduces client-side conventions
> and is not needed from within the HMaster or RegionServer. Is
> the consensus that it should reside in it's own module or be a sibling to
> the o.a.h.hbase.client source tree? I'm a little confused by the current
> state of the modules; hbase-client looks empty while o.a.h.hbase.client
> sits under hbase-server.
>
> Thanks,
> Nick
>
> On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > So I buy the argument about this being included in hbase, but several of
> > the questions still stand --
> >
> > Why is this part of hbase-common?  shouldn't this be just a dependency of
> > hbase-client module?  Does the hbase-server side need to depend on this?
> >
> > Since this is a large import of a currently isolated library, why not
> make
> > it a separate module instead of part of hbase-common?  This would
> enforce a
> > boundary that will prevent pollution from circular dependencies.
> >
> > Jon.
> >
> > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org> wrote:
> >
> > > I think this belongs in core HBase, as a replacement to Bytes, which
> > should
> > > be deprecated eventually. We have a Bytes utility which is supposed to
> > > convert basic java types to byte[]'s, but it does not work for signed
> > > numbers.
> > >
> > > We already know that all of the clients, Hive, Pig, Phoenix, have to
> have
> > > at least java type -> byte[] conversion utilities, and I think it is
> > > HBase's job to supply one so that different clients can interoperate.
> > Since
> > > internally we are also relying on serializing java types, we need that
> > > library in the core.
> > >
> > > BTW, I also think that we need to have a SQL-type to java type to
> byte[]
> > > layer, but that is another discussion.
> > >
> > > Enis
> > >
> > >
> > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com>
> > wrote:
> > >
> > > > Nick,
> > > >
> > > > While I believe having an order-preserving canonical serialization
> is a
> > > > good idea,  from doing a read of the mail and a skim of the jira it
> is
> > > not
> > > > clear to my why this is inside hbase as part of hbase-common.
> > > >
> > > > Why isn't this part of a library on top of hbase (a dependency for
> > > > Pig/Hive) instead of "inside" hbase?
> > > > Can't this functionality be done just from the client level?
> > > > What's the end goal hee? Is the goal here to replace the
> > Bytes.toBytes(*)
> > > > methods to enforced the ordering?
> > > > If I HBase has two mutually incompatible encodings "built-in", how
> > does a
> > > > dev know to use one or the other later on?
> > > > If this is essentially a mega import of a library (300k.. yikes) ,
> why
> > > not
> > > > make it a separate module instead of part of common?
> > > >
> > > > Jon.
> > > >
> > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm of the opinion that HBase should provide a mechanism for
> > > serializing
> > > > > common java types such that the serialized format sorts according
> the
> > > > > the natural ordering of the type. I think many application efforts
> > end
> > > up
> > > > > building a custom, partial implementation of this kind of
> > functionality
> > > > on
> > > > > their own. I think HBase should provide a canonical implementation
> of
> > > > such
> > > > > a serialization format so that third-parties can reliably build on
> > top
> > > of
> > > > > HBase. Not just user applications, but other tools like Pig and
> Hive
> > > are
> > > > > also enabled. Implementations for
> > > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > > > > compatible with similar features in Pig.
> > > > >
> > > > > After implementing something similar on multiple occasions,
> stumbled
> > > > across
> > > > > the Orderly <https://github.com/ndimiduk/orderly> library. It's
> also
> > > > > appears to have been adopted by other large projects, including
> > > > > Lily<https://github.com/NGDATA/orderly>.
> > > > > I've engaged the library's author for some improvements only to
> find
> > > out
> > > > > he's now at Google and will no longer be maintaining it. Thus, I
> > > propose
> > > > we
> > > > > take it into HBase.
> > > > >
> > > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> > > includes a
> > > > > patch that introduces Orderly into hbase-common under the orderly
> > > > > namespace. I have an associated branch on
> > > > > gihub<
> > > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > > >wherein
> > > > > I've broken the patch out into multiple commits to ease review.
> > > > > Please take a few minutes to give it a look.
> > > > >
> > > > > Thanks,
> > > > > Nick
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // jon@cloudera.com
> > > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Ted <yu...@gmail.com>.
Elliot is working on making hbase-client module concrete in hbase-7012. 

Cheers

On Feb 22, 2013, at 6:13 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> You're absolutely correct: this library introduces client-side conventions
> and is not needed from within the HMaster or RegionServer. Is
> the consensus that it should reside in it's own module or be a sibling to
> the o.a.h.hbase.client source tree? I'm a little confused by the current
> state of the modules; hbase-client looks empty while o.a.h.hbase.client
> sits under hbase-server.
> 
> Thanks,
> Nick
> 
> On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> 
>> So I buy the argument about this being included in hbase, but several of
>> the questions still stand --
>> 
>> Why is this part of hbase-common?  shouldn't this be just a dependency of
>> hbase-client module?  Does the hbase-server side need to depend on this?
>> 
>> Since this is a large import of a currently isolated library, why not make
>> it a separate module instead of part of hbase-common?  This would enforce a
>> boundary that will prevent pollution from circular dependencies.
>> 
>> Jon.
>> 
>> On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org> wrote:
>> 
>>> I think this belongs in core HBase, as a replacement to Bytes, which
>> should
>>> be deprecated eventually. We have a Bytes utility which is supposed to
>>> convert basic java types to byte[]'s, but it does not work for signed
>>> numbers.
>>> 
>>> We already know that all of the clients, Hive, Pig, Phoenix, have to have
>>> at least java type -> byte[] conversion utilities, and I think it is
>>> HBase's job to supply one so that different clients can interoperate.
>> Since
>>> internally we are also relying on serializing java types, we need that
>>> library in the core.
>>> 
>>> BTW, I also think that we need to have a SQL-type to java type to byte[]
>>> layer, but that is another discussion.
>>> 
>>> Enis
>>> 
>>> 
>>> On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com>
>> wrote:
>>> 
>>>> Nick,
>>>> 
>>>> While I believe having an order-preserving canonical serialization is a
>>>> good idea,  from doing a read of the mail and a skim of the jira it is
>>> not
>>>> clear to my why this is inside hbase as part of hbase-common.
>>>> 
>>>> Why isn't this part of a library on top of hbase (a dependency for
>>>> Pig/Hive) instead of "inside" hbase?
>>>> Can't this functionality be done just from the client level?
>>>> What's the end goal hee? Is the goal here to replace the
>> Bytes.toBytes(*)
>>>> methods to enforced the ordering?
>>>> If I HBase has two mutually incompatible encodings "built-in", how
>> does a
>>>> dev know to use one or the other later on?
>>>> If this is essentially a mega import of a library (300k.. yikes) , why
>>> not
>>>> make it a separate module instead of part of common?
>>>> 
>>>> Jon.
>>>> 
>>>> On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I'm of the opinion that HBase should provide a mechanism for
>>> serializing
>>>>> common java types such that the serialized format sorts according the
>>>>> the natural ordering of the type. I think many application efforts
>> end
>>> up
>>>>> building a custom, partial implementation of this kind of
>> functionality
>>>> on
>>>>> their own. I think HBase should provide a canonical implementation of
>>>> such
>>>>> a serialization format so that third-parties can reliably build on
>> top
>>> of
>>>>> HBase. Not just user applications, but other tools like Pig and Hive
>>> are
>>>>> also enabled. Implementations for
>>>>> HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
>>>>> HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
>>>>> HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
>>>>> compatible with similar features in Pig.
>>>>> 
>>>>> After implementing something similar on multiple occasions, stumbled
>>>> across
>>>>> the Orderly <https://github.com/ndimiduk/orderly> library. It's also
>>>>> appears to have been adopted by other large projects, including
>>>>> Lily<https://github.com/NGDATA/orderly>.
>>>>> I've engaged the library's author for some improvements only to find
>>> out
>>>>> he's now at Google and will no longer be maintaining it. Thus, I
>>> propose
>>>> we
>>>>> take it into HBase.
>>>>> 
>>>>> HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
>>> includes a
>>>>> patch that introduces Orderly into hbase-common under the orderly
>>>>> namespace. I have an associated branch on
>>>>> gihub<
>>>> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
>>>>>> wherein
>>>>> I've broken the patch out into multiple commits to ease review.
>>>>> Please take a few minutes to give it a look.
>>>>> 
>>>>> Thanks,
>>>>> Nick
>>>> 
>>>> 
>>>> 
>>>> --
>>>> // Jonathan Hsieh (shay)
>>>> // Software Engineer, Cloudera
>>>> // jon@cloudera.com
>> 
>> 
>> 
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>> 

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Nick Dimiduk <nd...@gmail.com>.
You're absolutely correct: this library introduces client-side conventions
and is not needed from within the HMaster or RegionServer. Is
the consensus that it should reside in it's own module or be a sibling to
the o.a.h.hbase.client source tree? I'm a little confused by the current
state of the modules; hbase-client looks empty while o.a.h.hbase.client
sits under hbase-server.

Thanks,
Nick

On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> So I buy the argument about this being included in hbase, but several of
> the questions still stand --
>
> Why is this part of hbase-common?  shouldn't this be just a dependency of
> hbase-client module?  Does the hbase-server side need to depend on this?
>
> Since this is a large import of a currently isolated library, why not make
> it a separate module instead of part of hbase-common?  This would enforce a
> boundary that will prevent pollution from circular dependencies.
>
> Jon.
>
> On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org> wrote:
>
> > I think this belongs in core HBase, as a replacement to Bytes, which
> should
> > be deprecated eventually. We have a Bytes utility which is supposed to
> > convert basic java types to byte[]'s, but it does not work for signed
> > numbers.
> >
> > We already know that all of the clients, Hive, Pig, Phoenix, have to have
> > at least java type -> byte[] conversion utilities, and I think it is
> > HBase's job to supply one so that different clients can interoperate.
> Since
> > internally we are also relying on serializing java types, we need that
> > library in the core.
> >
> > BTW, I also think that we need to have a SQL-type to java type to byte[]
> > layer, but that is another discussion.
> >
> > Enis
> >
> >
> > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> > > Nick,
> > >
> > > While I believe having an order-preserving canonical serialization is a
> > > good idea,  from doing a read of the mail and a skim of the jira it is
> > not
> > > clear to my why this is inside hbase as part of hbase-common.
> > >
> > > Why isn't this part of a library on top of hbase (a dependency for
> > > Pig/Hive) instead of "inside" hbase?
> > > Can't this functionality be done just from the client level?
> > > What's the end goal hee? Is the goal here to replace the
> Bytes.toBytes(*)
> > > methods to enforced the ordering?
> > > If I HBase has two mutually incompatible encodings "built-in", how
> does a
> > > dev know to use one or the other later on?
> > > If this is essentially a mega import of a library (300k.. yikes) , why
> > not
> > > make it a separate module instead of part of common?
> > >
> > > Jon.
> > >
> > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm of the opinion that HBase should provide a mechanism for
> > serializing
> > > > common java types such that the serialized format sorts according the
> > > > the natural ordering of the type. I think many application efforts
> end
> > up
> > > > building a custom, partial implementation of this kind of
> functionality
> > > on
> > > > their own. I think HBase should provide a canonical implementation of
> > > such
> > > > a serialization format so that third-parties can reliably build on
> top
> > of
> > > > HBase. Not just user applications, but other tools like Pig and Hive
> > are
> > > > also enabled. Implementations for
> > > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > > > compatible with similar features in Pig.
> > > >
> > > > After implementing something similar on multiple occasions, stumbled
> > > across
> > > > the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> > > > appears to have been adopted by other large projects, including
> > > > Lily<https://github.com/NGDATA/orderly>.
> > > > I've engaged the library's author for some improvements only to find
> > out
> > > > he's now at Google and will no longer be maintaining it. Thus, I
> > propose
> > > we
> > > > take it into HBase.
> > > >
> > > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> > includes a
> > > > patch that introduces Orderly into hbase-common under the orderly
> > > > namespace. I have an associated branch on
> > > > gihub<
> > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > > >wherein
> > > > I've broken the patch out into multiple commits to ease review.
> > > > Please take a few minutes to give it a look.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Jonathan Hsieh <jo...@cloudera.com>.
So I buy the argument about this being included in hbase, but several of
the questions still stand --

Why is this part of hbase-common?  shouldn't this be just a dependency of
hbase-client module?  Does the hbase-server side need to depend on this?

Since this is a large import of a currently isolated library, why not make
it a separate module instead of part of hbase-common?  This would enforce a
boundary that will prevent pollution from circular dependencies.

Jon.

On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <en...@apache.org> wrote:

> I think this belongs in core HBase, as a replacement to Bytes, which should
> be deprecated eventually. We have a Bytes utility which is supposed to
> convert basic java types to byte[]'s, but it does not work for signed
> numbers.
>
> We already know that all of the clients, Hive, Pig, Phoenix, have to have
> at least java type -> byte[] conversion utilities, and I think it is
> HBase's job to supply one so that different clients can interoperate. Since
> internally we are also relying on serializing java types, we need that
> library in the core.
>
> BTW, I also think that we need to have a SQL-type to java type to byte[]
> layer, but that is another discussion.
>
> Enis
>
>
> On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
> > Nick,
> >
> > While I believe having an order-preserving canonical serialization is a
> > good idea,  from doing a read of the mail and a skim of the jira it is
> not
> > clear to my why this is inside hbase as part of hbase-common.
> >
> > Why isn't this part of a library on top of hbase (a dependency for
> > Pig/Hive) instead of "inside" hbase?
> > Can't this functionality be done just from the client level?
> > What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
> > methods to enforced the ordering?
> > If I HBase has two mutually incompatible encodings "built-in", how does a
> > dev know to use one or the other later on?
> > If this is essentially a mega import of a library (300k.. yikes) , why
> not
> > make it a separate module instead of part of common?
> >
> > Jon.
> >
> > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm of the opinion that HBase should provide a mechanism for
> serializing
> > > common java types such that the serialized format sorts according the
> > > the natural ordering of the type. I think many application efforts end
> up
> > > building a custom, partial implementation of this kind of functionality
> > on
> > > their own. I think HBase should provide a canonical implementation of
> > such
> > > a serialization format so that third-parties can reliably build on top
> of
> > > HBase. Not just user applications, but other tools like Pig and Hive
> are
> > > also enabled. Implementations for
> > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > > compatible with similar features in Pig.
> > >
> > > After implementing something similar on multiple occasions, stumbled
> > across
> > > the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> > > appears to have been adopted by other large projects, including
> > > Lily<https://github.com/NGDATA/orderly>.
> > > I've engaged the library's author for some improvements only to find
> out
> > > he's now at Google and will no longer be maintaining it. Thus, I
> propose
> > we
> > > take it into HBase.
> > >
> > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> includes a
> > > patch that introduces Orderly into hbase-common under the orderly
> > > namespace. I have an associated branch on
> > > gihub<
> > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > >wherein
> > > I've broken the patch out into multiple commits to ease review.
> > > Please take a few minutes to give it a look.
> > >
> > > Thanks,
> > > Nick
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Enis Söztutar <en...@apache.org>.
I think this belongs in core HBase, as a replacement to Bytes, which should
be deprecated eventually. We have a Bytes utility which is supposed to
convert basic java types to byte[]'s, but it does not work for signed
numbers.

We already know that all of the clients, Hive, Pig, Phoenix, have to have
at least java type -> byte[] conversion utilities, and I think it is
HBase's job to supply one so that different clients can interoperate. Since
internally we are also relying on serializing java types, we need that
library in the core.

BTW, I also think that we need to have a SQL-type to java type to byte[]
layer, but that is another discussion.

Enis


On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Nick,
>
> While I believe having an order-preserving canonical serialization is a
> good idea,  from doing a read of the mail and a skim of the jira it is not
> clear to my why this is inside hbase as part of hbase-common.
>
> Why isn't this part of a library on top of hbase (a dependency for
> Pig/Hive) instead of "inside" hbase?
> Can't this functionality be done just from the client level?
> What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
> methods to enforced the ordering?
> If I HBase has two mutually incompatible encodings "built-in", how does a
> dev know to use one or the other later on?
> If this is essentially a mega import of a library (300k.. yikes) , why not
> make it a separate module instead of part of common?
>
> Jon.
>
> On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I'm of the opinion that HBase should provide a mechanism for serializing
> > common java types such that the serialized format sorts according the
> > the natural ordering of the type. I think many application efforts end up
> > building a custom, partial implementation of this kind of functionality
> on
> > their own. I think HBase should provide a canonical implementation of
> such
> > a serialization format so that third-parties can reliably build on top of
> > HBase. Not just user applications, but other tools like Pig and Hive are
> > also enabled. Implementations for
> > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > compatible with similar features in Pig.
> >
> > After implementing something similar on multiple occasions, stumbled
> across
> > the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> > appears to have been adopted by other large projects, including
> > Lily<https://github.com/NGDATA/orderly>.
> > I've engaged the library's author for some improvements only to find out
> > he's now at Google and will no longer be maintaining it. Thus, I propose
> we
> > take it into HBase.
> >
> > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692> includes a
> > patch that introduces Orderly into hbase-common under the orderly
> > namespace. I have an associated branch on
> > gihub<
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > >wherein
> > I've broken the patch out into multiple commits to ease review.
> > Please take a few minutes to give it a look.
> >
> > Thanks,
> > Nick
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: Review request for HBASE-7692: Ordered byte[] serialization

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Nick,

While I believe having an order-preserving canonical serialization is a
good idea,  from doing a read of the mail and a skim of the jira it is not
clear to my why this is inside hbase as part of hbase-common.

Why isn't this part of a library on top of hbase (a dependency for
Pig/Hive) instead of "inside" hbase?
Can't this functionality be done just from the client level?
What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
methods to enforced the ordering?
If I HBase has two mutually incompatible encodings "built-in", how does a
dev know to use one or the other later on?
If this is essentially a mega import of a library (300k.. yikes) , why not
make it a separate module instead of part of common?

Jon.

On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi everyone,
>
> I'm of the opinion that HBase should provide a mechanism for serializing
> common java types such that the serialized format sorts according the
> the natural ordering of the type. I think many application efforts end up
> building a custom, partial implementation of this kind of functionality on
> their own. I think HBase should provide a canonical implementation of such
> a serialization format so that third-parties can reliably build on top of
> HBase. Not just user applications, but other tools like Pig and Hive are
> also enabled. Implementations for
> HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> compatible with similar features in Pig.
>
> After implementing something similar on multiple occasions, stumbled across
> the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> appears to have been adopted by other large projects, including
> Lily<https://github.com/NGDATA/orderly>.
> I've engaged the library's author for some improvements only to find out
> he's now at Google and will no longer be maintaining it. Thus, I propose we
> take it into HBase.
>
> HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692> includes a
> patch that introduces Orderly into hbase-common under the orderly
> namespace. I have an associated branch on
> gihub<https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> >wherein
> I've broken the patch out into multiple commits to ease review.
> Please take a few minutes to give it a look.
>
> Thanks,
> Nick
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com