You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Doug Meil <do...@explorysmedical.com> on 2013/01/15 23:01:50 UTC

Re: Constructing rowkeys and HBASE-7221

Hi there, well, this request for input fell like a thud.  :-)

But I think perhaps it has to do with the fact that I sent it to the
dev-list instead of the user-list, as people that are actively writing
HBase itself (devs) need less help with such keybuilding utilities.

So one last request for feedback, but this time aimed at users of HBase:
how has your key-building experience been?

Thanks!



On 1/7/13 11:04 AM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Greetings folks-
>
>I would like to restart the conversation on
>https://issues.apache.org/jira/browse/HBASE-7221 because there continue
>to be conversations on the dist-list about creating composite rowkeys,
>and while HBase makes just about anything possible, it doesn¹t make much
>easy in this respect.
>
>What I¹m lobbying for is a utility class (see the v3 patch in HBASE-7221)
>that can both create and read rowkeys (so this isn¹t just a one-way
>builder pattern).
>
>This is currently stuck because it was noted that Bytes has an issue with
>sort-order of numbers specifically if you have both negative and positive
>values, which is really a different issue, but because this patch uses
>Bytes it¹s related.
>
>What are people¹s thoughts on this topic in general, and the v3 version
>of the patch specifically?  (and the last set of comments).  Thanks!
>
>One of the unit tests shows the example of usage.  The last set of
>comments suggested that RowKey be renamed FixedLengthRowKey, which I
>think is a good idea.  A follow-on patch could include
>VariableLengthRowKey for folks that use strings in the rowkeys.
>
>
>  public void testCreate() throws Exception {
>
>    int elements[] = {RowKeySchema.SIZEOF_MD5_HASH,
>RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG};
>    RowKeySchema schema = new RowKeySchema(elements);
>
>    RowKey rowkey = schema.createRowKey();
>    rowkey.setHash(0, hashVal);
>    rowkey.setInt(1, intVal);
>    rowkey.setLong(2, longVal);
>
>    byte bytes[] = rowkey.getBytes();
>    Assert.assertEquals("key length", schema.getRowKeyLength(),
>bytes.length);
>
>    Assert.assertEquals("e1", rowkey.getInt(1), intVal);
>    Assert.assertEquals("e2", rowkey.getLong(2), longVal);
>  }
>
>Doug Meil
>Chief Software Architect, Explorys
>doug.meil@explorys.com
>



Re: Constructing rowkeys and HBASE-7221

Posted by Doug Meil <do...@explorysmedical.com>.
Thanks Aaron!

I will take a look at Kiji.  And I think it underscores the need for some
type of utility row rowkey building/parsing being available in HBase,
because one of the first things folks tend to do is start building their
own keybuilder utility when they start using Hbase (same sentiment also
expressed by others in the HBASE-7221 ticket comments).

It's good that you have full control over the rowkey (i.e., byte[]) as a
backstop, but HBase should also try to make things a bit easier for some
common cases.  I think it will help adoption.

The general idea is a FixedLengthRowKey and a VariableLengthRowKey along
with a RowKeySchema class, and I think that the variant you bring up is a
great idea (e.g., prefix vs. hash).  Let's keep this ball rolling!



On 1/16/13 2:06 PM, "Aaron Kimball" <ak...@gmail.com> wrote:

>Hi Doug,
>
>This HBase feature is really interesting. It is quite related to some work
>we're doing on Kiji, our schema management project. In particular, we've
>also been focusing on building composite row keys correctly. One thing
>that
>jumped out at me in that ticket is that with a composition of md5hash and
>other (string, int, etc) components, you probably don't want the whole
>hash. If you're using that to shard your rows more efficiently across
>regions, you might want to just use a subset of the md5 bytes as a prefix.
>It might be a good idea to offer users control of this.
>
>Our own thoughts on this on the Kiji side are being tracked at
>https://jira.kiji.org/browse/schema-3 where we have a design doc that goes
>into a bit more detail.
>
>Cheers,
>- Aaron
>
>
>On Tue, Jan 15, 2013 at 2:01 PM, Doug Meil
><do...@explorysmedical.com>wrote:
>
>>
>> Hi there, well, this request for input fell like a thud.  :-)
>>
>> But I think perhaps it has to do with the fact that I sent it to the
>> dev-list instead of the user-list, as people that are actively writing
>> HBase itself (devs) need less help with such keybuilding utilities.
>>
>> So one last request for feedback, but this time aimed at users of HBase:
>> how has your key-building experience been?
>>
>> Thanks!
>>
>>
>>
>> On 1/7/13 11:04 AM, "Doug Meil" <do...@explorysmedical.com> wrote:
>>
>> >
>> >Greetings folks-
>> >
>> >I would like to restart the conversation on
>> >https://issues.apache.org/jira/browse/HBASE-7221 because there continue
>> >to be conversations on the dist-list about creating composite rowkeys,
>> >and while HBase makes just about anything possible, it doesn¹t make
>>much
>> >easy in this respect.
>> >
>> >What I¹m lobbying for is a utility class (see the v3 patch in
>>HBASE-7221)
>> >that can both create and read rowkeys (so this isn¹t just a one-way
>> >builder pattern).
>> >
>> >This is currently stuck because it was noted that Bytes has an issue
>>with
>> >sort-order of numbers specifically if you have both negative and
>>positive
>> >values, which is really a different issue, but because this patch uses
>> >Bytes it¹s related.
>> >
>> >What are people¹s thoughts on this topic in general, and the v3 version
>> >of the patch specifically?  (and the last set of comments).  Thanks!
>> >
>> >One of the unit tests shows the example of usage.  The last set of
>> >comments suggested that RowKey be renamed FixedLengthRowKey, which I
>> >think is a good idea.  A follow-on patch could include
>> >VariableLengthRowKey for folks that use strings in the rowkeys.
>> >
>> >
>> >  public void testCreate() throws Exception {
>> >
>> >    int elements[] = {RowKeySchema.SIZEOF_MD5_HASH,
>> >RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG};
>> >    RowKeySchema schema = new RowKeySchema(elements);
>> >
>> >    RowKey rowkey = schema.createRowKey();
>> >    rowkey.setHash(0, hashVal);
>> >    rowkey.setInt(1, intVal);
>> >    rowkey.setLong(2, longVal);
>> >
>> >    byte bytes[] = rowkey.getBytes();
>> >    Assert.assertEquals("key length", schema.getRowKeyLength(),
>> >bytes.length);
>> >
>> >    Assert.assertEquals("e1", rowkey.getInt(1), intVal);
>> >    Assert.assertEquals("e2", rowkey.getLong(2), longVal);
>> >  }
>> >
>> >Doug Meil
>> >Chief Software Architect, Explorys
>> >doug.meil@explorys.com
>> >
>>
>>
>>


Re: Constructing rowkeys and HBASE-7221

Posted by Aaron Kimball <ak...@gmail.com>.
Hi Doug,

This HBase feature is really interesting. It is quite related to some work
we're doing on Kiji, our schema management project. In particular, we've
also been focusing on building composite row keys correctly. One thing that
jumped out at me in that ticket is that with a composition of md5hash and
other (string, int, etc) components, you probably don't want the whole
hash. If you're using that to shard your rows more efficiently across
regions, you might want to just use a subset of the md5 bytes as a prefix.
It might be a good idea to offer users control of this.

Our own thoughts on this on the Kiji side are being tracked at
https://jira.kiji.org/browse/schema-3 where we have a design doc that goes
into a bit more detail.

Cheers,
- Aaron


On Tue, Jan 15, 2013 at 2:01 PM, Doug Meil <do...@explorysmedical.com>wrote:

>
> Hi there, well, this request for input fell like a thud.  :-)
>
> But I think perhaps it has to do with the fact that I sent it to the
> dev-list instead of the user-list, as people that are actively writing
> HBase itself (devs) need less help with such keybuilding utilities.
>
> So one last request for feedback, but this time aimed at users of HBase:
> how has your key-building experience been?
>
> Thanks!
>
>
>
> On 1/7/13 11:04 AM, "Doug Meil" <do...@explorysmedical.com> wrote:
>
> >
> >Greetings folks-
> >
> >I would like to restart the conversation on
> >https://issues.apache.org/jira/browse/HBASE-7221 because there continue
> >to be conversations on the dist-list about creating composite rowkeys,
> >and while HBase makes just about anything possible, it doesn¹t make much
> >easy in this respect.
> >
> >What I¹m lobbying for is a utility class (see the v3 patch in HBASE-7221)
> >that can both create and read rowkeys (so this isn¹t just a one-way
> >builder pattern).
> >
> >This is currently stuck because it was noted that Bytes has an issue with
> >sort-order of numbers specifically if you have both negative and positive
> >values, which is really a different issue, but because this patch uses
> >Bytes it¹s related.
> >
> >What are people¹s thoughts on this topic in general, and the v3 version
> >of the patch specifically?  (and the last set of comments).  Thanks!
> >
> >One of the unit tests shows the example of usage.  The last set of
> >comments suggested that RowKey be renamed FixedLengthRowKey, which I
> >think is a good idea.  A follow-on patch could include
> >VariableLengthRowKey for folks that use strings in the rowkeys.
> >
> >
> >  public void testCreate() throws Exception {
> >
> >    int elements[] = {RowKeySchema.SIZEOF_MD5_HASH,
> >RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG};
> >    RowKeySchema schema = new RowKeySchema(elements);
> >
> >    RowKey rowkey = schema.createRowKey();
> >    rowkey.setHash(0, hashVal);
> >    rowkey.setInt(1, intVal);
> >    rowkey.setLong(2, longVal);
> >
> >    byte bytes[] = rowkey.getBytes();
> >    Assert.assertEquals("key length", schema.getRowKeyLength(),
> >bytes.length);
> >
> >    Assert.assertEquals("e1", rowkey.getInt(1), intVal);
> >    Assert.assertEquals("e2", rowkey.getLong(2), longVal);
> >  }
> >
> >Doug Meil
> >Chief Software Architect, Explorys
> >doug.meil@explorys.com
> >
>
>
>