You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Clint Morgan <cm...@troove.net> on 2008/04/22 19:31:02 UTC

Secondary indexes

All,

We want to put secondary indexes into hbase. The motivation is that we
are storing data in hbase that we want to serve to users. We would
like to be able to serve rows sorted by column values. Our queries
will be over rows with a given key prefix, so we should not be hitting
to many regions.

I was thinking it would work roughly like this:

- At table creation time, individual columns can be declared as
indexed. By default we could sort the column values lexicographically,
or we can provide a WritableComparatorFactory<T> which has the ability
to make values of type T from a byte [], as well as providing a
Comparator<T>. (Better than providing a Comparator<byte[]> as it only
costs once per row insert for deserialization, rather that twice on
each comparison).

- We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
which keeps the column values in order, and maps them back to row
keys. First cut may just keep all this in memory, but it should be
backed with MapFile(s).

- Add to the hregion the ability to scan through keys in column order.
Just iterate through the SortedMap, run a filter on the key, and if it
passes do a get on the row.

- Add a ColumnOrderedClientScanner which will open column order
scanners to all applicable hregions, and continuously pick row with
the lowest column value from each of the client scanners.

- Region splits should be easy enough, just a scan through the
SortedMap to partition.

Of course, the index could also be used for more efficient querying on
the indexed column's values.

Do other users have a need for this functionality?

What do developers think about this? I know hbase is more intended for
back-end batch style processing, but we have this need.

Cheers,
-clint

Re: Secondary indexes

Posted by stack <st...@duboce.net>.
Clint Morgan wrote:
> ...
>>  This scanner would have a significant client-side component to do the
>> arbitrage between all regions to figure the lowest column value?  If you had
>> a new type of 'region' -- one denoted by lowest and upper column then the
>> client-side logic would fade away and your scanner would look like current
>> scanners.
>>     
>
> Is this essentially the same as Bryan's suggestion of just maintaining
> another hbase table?
>
>   
No.  Currently client proceeds through regions in order by asking the 
.META. for next region each time its done w/ the current one.  You do 
not have an authority like .META. to ask, not unless you did Bryan's 
suggestion (My undeveloped notion was some messy facade on Regions that 
would answer the "whats your lowest/highest column?" for the aggregating 
client).

St.Ack

Re: Secondary indexes

Posted by Clint Morgan <cl...@gmail.com>.
>  I don't follow what the Factory adds.

The Factory part of the name just means that it can make an object of
type T from a byte[] (the column value). This is the type that we keep
in the set and sort on.

>
>  We're talking about getting HBASE-82 into 0.2.  Does that interfere with
> this proposal?

I don't think it would get in the way.

>  Would be sweet if you could leverage the HBase memcache code and flusher to
> do the above.

Agreed.

>  This Map would be global for the table?  Or per Region?

Was thinking one per region. Most of our order-by queries should hit
just a few regions due to key prefix.

>  A lucene index wouldn't work for you because you want ordering?

Thought about lucene a bit. Looks like it can provide ordering, but
for only basic types. Anyways, we could still probably make it work,
but it seems more heavyweight, especially if we just want ordering. Am
still open to this though.

>  You'd be random reading rows.  You're OK w/ current performance?  (For sure
> it will only improve but....).

Yeah, random reads are concern, but this should still be improvement
of our current approach to ordered-by.

>  This scanner would have a significant client-side component to do the
> arbitrage between all regions to figure the lowest column value?  If you had
> a new type of 'region' -- one denoted by lowest and upper column then the
> client-side logic would fade away and your scanner would look like current
> scanners.

Is this essentially the same as Bryan's suggestion of just maintaining
another hbase table?

>  Splits would not be row-based and run as they currently do, but rather
> sorted-column based?

No, I just meant when we split a region, it is easy to split the
column's SortedSet for each daughter.

>  How are you thinking of adding in this new functionality?  Subclassing
> HRegionServer?

Was thinking easiest / most isolated approach may be to sublcass
HRegionServer to intercept puts, maintain SortedSets, and add new
interface for obtaining ordered-by scanner.

But more complete solution may be to modify/subclass HRegion:
intercept puts here, store sorted column mapfiles along side other
region data, handle splitting, scanning...

>  St.Ack
>
>
> > Cheers,
> > -clint
> >
> >
>
>

Re: Secondary indexes

Posted by "Edward J. Yoon" <ed...@udanax.org>.
I remember someone objecting to the scheme. :)

On Wed, Apr 23, 2008 at 6:46 AM, Clint Morgan <cl...@gmail.com> wrote:
> The way I picture it, we only need to check the 100 or 1000 regions to
> get the very first row of a order-by scan. Afterwards, we just need to
> get the next row from the region that we took the last row from,
> compare with the existing rows we have, and return the lowest.
>
> So if we have R regions to grab from, the first call to next() will
> have to fetch all R rows from the regions and do R log R comparisons
> to do the sort. Then each call to next() will cost 1 row fetch from
> region plus log R comparisons to put in a sorted tree.
>
>
>
> On Tue, Apr 22, 2008 at 1:10 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> > If you just want the first 10 by a certain prefix ordered by a column, then
> > you will definitely be better off scanning them by row and ordering them
> > clientside.
> >
> >  Surely your idea of maintaining a separate SortedMap in each region would
> > work, but I don't think you should discount the cost associated with having
> > to talk to a bunch of different regions every time you want to know what the
> > next row is. You'll do a lot of extra work to get that merged view of the
> > index, and potentially the approach won't scale up to queries that might
> > cover more than a "few" regions - can you imagine having to check 100 or
> > 1000 regions for the next result every time you needed to iterate?
> >
> >
> >
> >  On Apr 22, 2008, at 12:58 PM, Clint Morgan wrote:
> >
> >
> > > Yeah, that would be an easy approach. We would need HBASE-82.
> > >
> > > The main problem I see here is that we cannot take (as much) advantage
> > > of our row key prefix in weeding out rows.
> > >
> > > Say I want the first 10 rows that start with XXX ordered by A:amount,
> > > then I would have scan through column values from rows everywhere in
> > > the table until I hit 10 with my prefix. Could be costly if table is
> > > large compared to the number of rows that start with XXX.
> > >
> > > Whereas if we have one SortedMap per Region, then I can quickly narrow
> > > down to (hopefully) a few regions based on key prefix.
> > >
> > > Though other usage / table loading patterns would surely benefit from
> > > this approach...
> > >
> > > On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> > >
> > > > This doesn't have to be all that complicated.
> > > >
> > > >  Why not keep another HBase table as the index? The keys would be the
> > column
> > > > values. There'd be a single matchingRows column family, and the
> > qualifiers
> > > > and values would be the rows that match that column. Then, when you want
> > to
> > > > scan in column order instead of row order, you scan the index table,
> > find
> > > > the list of rows that match each column, and then do a random read to
> > grab
> > > > those individually. It'll for sure be slower than scanning a table
> > ordered
> > > > by rows, but it'll get you what you want. It'll also handle the case
> > where
> > > > the column values aren't unique.
> > > >
> > > >  If you need custom sorting for those values, then HBASE-82 would solve
> > that
> > > > problem.
> > > >
> > > >  -Bryan
> > > >
> > > >
> > > >
> > > >  On Apr 22, 2008, at 11:58 AM, stack wrote:
> > > >
> > > >
> > > >
> > > > > Some questions interlaced below:
> > > > >
> > > > > Clint Morgan wrote:
> > > > >
> > > > >
> > > > > > All,
> > > > > >
> > > > > > We want to put secondary indexes into hbase. The motivation is that
> > we
> > > > > > are storing data in hbase that we want to serve to users. We would
> > > > > > like to be able to serve rows sorted by column values. Our queries
> > > > > > will be over rows with a given key prefix, so we should not be
> > hitting
> > > > > > to many regions.
> > > > > >  I was thinking it would work roughly like this:
> > > > > >
> > > > > > - At table creation time, individual columns can be declared as
> > > > > > indexed. By default we could sort the column values
> > lexicographically,
> > > > > > or we can provide a WritableComparatorFactory<T> which has the
> > ability
> > > > > > to make values of type T from a byte [], as well as providing a
> > > > > > Comparator<T>. (Better than providing a Comparator<byte[]> as it
> > only
> > > > > > costs once per row insert for deserialization, rather that twice on
> > > > > > each comparison).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > I don't follow what the Factory adds.
> > > > >
> > > > > We're talking about getting HBASE-82 into 0.2.  Does that interfere
> > with
> > > > >
> > > > this proposal?  (I'm thinking that along w/ rows becoming byte arrays
> > rather
> > > > Text with a client-supplied Comparator, column qualifiers would shift to
> > be
> > > > byte arrays also -- though yeah, implies that if your sort is not
> > > > byte-lexicographical, yes, the compares can be costly involving two
> > > > deserializations per compare).
> > > >
> > > > >
> > > > >
> > > > > > - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> > > > > > which keeps the column values in order, and maps them back to row
> > > > > > keys. First cut may just keep all this in memory, but it should be
> > > > > > backed with MapFile(s).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > Would be sweet if you could leverage the HBase memcache code and
> > flusher
> > > > >
> > > > to do the above.
> > > >
> > > > >
> > > > > This Map would be global for the table?  Or per Region?
> > > > >
> > > > > A lucene index wouldn't work for you because you want ordering?
> > > > >
> > > > >
> > > > >
> > > > > > - Add to the hregion the ability to scan through keys in column
> > order.
> > > > > > Just iterate through the SortedMap, run a filter on the key, and if
> > it
> > > > > > passes do a get on the row.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > You'd be random reading rows.  You're OK w/ current performance?  (For
> > > > >
> > > > sure it will only improve but....).
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > - Add a ColumnOrderedClientScanner which will open column order
> > > > > > scanners to all applicable hregions, and continuously pick row with
> > > > > > the lowest column value from each of the client scanners.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > This scanner would have a significant client-side component to do the
> > > > >
> > > > arbitrage between all regions to figure the lowest column value?  If you
> > had
> > > > a new type of 'region' -- one denoted by lowest and upper column then
> > the
> > > > client-side logic would fade away and your scanner would look like
> > current
> > > > scanners.
> > > >
> > > > >
> > > > >
> > > > > > - Region splits should be easy enough, just a scan through the
> > > > > > SortedMap to partition.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > Splits would not be row-based and run as they currently do, but rather
> > > > >
> > > > sorted-column based?
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > Of course, the index could also be used for more efficient querying
> > on
> > > > > > the indexed column's values.
> > > > > >
> > > > > > Do other users have a need for this functionality?
> > > > > >
> > > > > > What do developers think about this? I know hbase is more intended
> > for
> > > > > > back-end batch style processing, but we have this need.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > How are you thinking of adding in this new functionality?  Subclassing
> > > > >
> > > > HRegionServer?
> > > >
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > Cheers,
> > > > > > -clint
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> >
>



-- 
B. Regards,
Edward J. Yoon

Re: Secondary indexes

Posted by "Edward J. Yoon" <ed...@udanax.org>.
I remember someone objecting to the scheme. :)

On Wed, Apr 23, 2008 at 6:46 AM, Clint Morgan <cl...@gmail.com> wrote:
> The way I picture it, we only need to check the 100 or 1000 regions to
> get the very first row of a order-by scan. Afterwards, we just need to
> get the next row from the region that we took the last row from,
> compare with the existing rows we have, and return the lowest.
>
> So if we have R regions to grab from, the first call to next() will
> have to fetch all R rows from the regions and do R log R comparisons
> to do the sort. Then each call to next() will cost 1 row fetch from
> region plus log R comparisons to put in a sorted tree.
>
>
>
> On Tue, Apr 22, 2008 at 1:10 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> > If you just want the first 10 by a certain prefix ordered by a column, then
> > you will definitely be better off scanning them by row and ordering them
> > clientside.
> >
> >  Surely your idea of maintaining a separate SortedMap in each region would
> > work, but I don't think you should discount the cost associated with having
> > to talk to a bunch of different regions every time you want to know what the
> > next row is. You'll do a lot of extra work to get that merged view of the
> > index, and potentially the approach won't scale up to queries that might
> > cover more than a "few" regions - can you imagine having to check 100 or
> > 1000 regions for the next result every time you needed to iterate?
> >
> >
> >
> >  On Apr 22, 2008, at 12:58 PM, Clint Morgan wrote:
> >
> >
> > > Yeah, that would be an easy approach. We would need HBASE-82.
> > >
> > > The main problem I see here is that we cannot take (as much) advantage
> > > of our row key prefix in weeding out rows.
> > >
> > > Say I want the first 10 rows that start with XXX ordered by A:amount,
> > > then I would have scan through column values from rows everywhere in
> > > the table until I hit 10 with my prefix. Could be costly if table is
> > > large compared to the number of rows that start with XXX.
> > >
> > > Whereas if we have one SortedMap per Region, then I can quickly narrow
> > > down to (hopefully) a few regions based on key prefix.
> > >
> > > Though other usage / table loading patterns would surely benefit from
> > > this approach...
> > >
> > > On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> > >
> > > > This doesn't have to be all that complicated.
> > > >
> > > >  Why not keep another HBase table as the index? The keys would be the
> > column
> > > > values. There'd be a single matchingRows column family, and the
> > qualifiers
> > > > and values would be the rows that match that column. Then, when you want
> > to
> > > > scan in column order instead of row order, you scan the index table,
> > find
> > > > the list of rows that match each column, and then do a random read to
> > grab
> > > > those individually. It'll for sure be slower than scanning a table
> > ordered
> > > > by rows, but it'll get you what you want. It'll also handle the case
> > where
> > > > the column values aren't unique.
> > > >
> > > >  If you need custom sorting for those values, then HBASE-82 would solve
> > that
> > > > problem.
> > > >
> > > >  -Bryan
> > > >
> > > >
> > > >
> > > >  On Apr 22, 2008, at 11:58 AM, stack wrote:
> > > >
> > > >
> > > >
> > > > > Some questions interlaced below:
> > > > >
> > > > > Clint Morgan wrote:
> > > > >
> > > > >
> > > > > > All,
> > > > > >
> > > > > > We want to put secondary indexes into hbase. The motivation is that
> > we
> > > > > > are storing data in hbase that we want to serve to users. We would
> > > > > > like to be able to serve rows sorted by column values. Our queries
> > > > > > will be over rows with a given key prefix, so we should not be
> > hitting
> > > > > > to many regions.
> > > > > >  I was thinking it would work roughly like this:
> > > > > >
> > > > > > - At table creation time, individual columns can be declared as
> > > > > > indexed. By default we could sort the column values
> > lexicographically,
> > > > > > or we can provide a WritableComparatorFactory<T> which has the
> > ability
> > > > > > to make values of type T from a byte [], as well as providing a
> > > > > > Comparator<T>. (Better than providing a Comparator<byte[]> as it
> > only
> > > > > > costs once per row insert for deserialization, rather that twice on
> > > > > > each comparison).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > I don't follow what the Factory adds.
> > > > >
> > > > > We're talking about getting HBASE-82 into 0.2.  Does that interfere
> > with
> > > > >
> > > > this proposal?  (I'm thinking that along w/ rows becoming byte arrays
> > rather
> > > > Text with a client-supplied Comparator, column qualifiers would shift to
> > be
> > > > byte arrays also -- though yeah, implies that if your sort is not
> > > > byte-lexicographical, yes, the compares can be costly involving two
> > > > deserializations per compare).
> > > >
> > > > >
> > > > >
> > > > > > - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> > > > > > which keeps the column values in order, and maps them back to row
> > > > > > keys. First cut may just keep all this in memory, but it should be
> > > > > > backed with MapFile(s).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > Would be sweet if you could leverage the HBase memcache code and
> > flusher
> > > > >
> > > > to do the above.
> > > >
> > > > >
> > > > > This Map would be global for the table?  Or per Region?
> > > > >
> > > > > A lucene index wouldn't work for you because you want ordering?
> > > > >
> > > > >
> > > > >
> > > > > > - Add to the hregion the ability to scan through keys in column
> > order.
> > > > > > Just iterate through the SortedMap, run a filter on the key, and if
> > it
> > > > > > passes do a get on the row.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > You'd be random reading rows.  You're OK w/ current performance?  (For
> > > > >
> > > > sure it will only improve but....).
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > - Add a ColumnOrderedClientScanner which will open column order
> > > > > > scanners to all applicable hregions, and continuously pick row with
> > > > > > the lowest column value from each of the client scanners.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > This scanner would have a significant client-side component to do the
> > > > >
> > > > arbitrage between all regions to figure the lowest column value?  If you
> > had
> > > > a new type of 'region' -- one denoted by lowest and upper column then
> > the
> > > > client-side logic would fade away and your scanner would look like
> > current
> > > > scanners.
> > > >
> > > > >
> > > > >
> > > > > > - Region splits should be easy enough, just a scan through the
> > > > > > SortedMap to partition.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > Splits would not be row-based and run as they currently do, but rather
> > > > >
> > > > sorted-column based?
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > Of course, the index could also be used for more efficient querying
> > on
> > > > > > the indexed column's values.
> > > > > >
> > > > > > Do other users have a need for this functionality?
> > > > > >
> > > > > > What do developers think about this? I know hbase is more intended
> > for
> > > > > > back-end batch style processing, but we have this need.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > How are you thinking of adding in this new functionality?  Subclassing
> > > > >
> > > > HRegionServer?
> > > >
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > Cheers,
> > > > > > -clint
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> >
>



-- 
B. Regards,
Edward J. Yoon

Re: Secondary indexes

Posted by Clint Morgan <cl...@gmail.com>.
The way I picture it, we only need to check the 100 or 1000 regions to
get the very first row of a order-by scan. Afterwards, we just need to
get the next row from the region that we took the last row from,
compare with the existing rows we have, and return the lowest.

So if we have R regions to grab from, the first call to next() will
have to fetch all R rows from the regions and do R log R comparisons
to do the sort. Then each call to next() will cost 1 row fetch from
region plus log R comparisons to put in a sorted tree.


On Tue, Apr 22, 2008 at 1:10 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> If you just want the first 10 by a certain prefix ordered by a column, then
> you will definitely be better off scanning them by row and ordering them
> clientside.
>
>  Surely your idea of maintaining a separate SortedMap in each region would
> work, but I don't think you should discount the cost associated with having
> to talk to a bunch of different regions every time you want to know what the
> next row is. You'll do a lot of extra work to get that merged view of the
> index, and potentially the approach won't scale up to queries that might
> cover more than a "few" regions - can you imagine having to check 100 or
> 1000 regions for the next result every time you needed to iterate?
>
>
>
>  On Apr 22, 2008, at 12:58 PM, Clint Morgan wrote:
>
>
> > Yeah, that would be an easy approach. We would need HBASE-82.
> >
> > The main problem I see here is that we cannot take (as much) advantage
> > of our row key prefix in weeding out rows.
> >
> > Say I want the first 10 rows that start with XXX ordered by A:amount,
> > then I would have scan through column values from rows everywhere in
> > the table until I hit 10 with my prefix. Could be costly if table is
> > large compared to the number of rows that start with XXX.
> >
> > Whereas if we have one SortedMap per Region, then I can quickly narrow
> > down to (hopefully) a few regions based on key prefix.
> >
> > Though other usage / table loading patterns would surely benefit from
> > this approach...
> >
> > On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> >
> > > This doesn't have to be all that complicated.
> > >
> > >  Why not keep another HBase table as the index? The keys would be the
> column
> > > values. There'd be a single matchingRows column family, and the
> qualifiers
> > > and values would be the rows that match that column. Then, when you want
> to
> > > scan in column order instead of row order, you scan the index table,
> find
> > > the list of rows that match each column, and then do a random read to
> grab
> > > those individually. It'll for sure be slower than scanning a table
> ordered
> > > by rows, but it'll get you what you want. It'll also handle the case
> where
> > > the column values aren't unique.
> > >
> > >  If you need custom sorting for those values, then HBASE-82 would solve
> that
> > > problem.
> > >
> > >  -Bryan
> > >
> > >
> > >
> > >  On Apr 22, 2008, at 11:58 AM, stack wrote:
> > >
> > >
> > >
> > > > Some questions interlaced below:
> > > >
> > > > Clint Morgan wrote:
> > > >
> > > >
> > > > > All,
> > > > >
> > > > > We want to put secondary indexes into hbase. The motivation is that
> we
> > > > > are storing data in hbase that we want to serve to users. We would
> > > > > like to be able to serve rows sorted by column values. Our queries
> > > > > will be over rows with a given key prefix, so we should not be
> hitting
> > > > > to many regions.
> > > > >  I was thinking it would work roughly like this:
> > > > >
> > > > > - At table creation time, individual columns can be declared as
> > > > > indexed. By default we could sort the column values
> lexicographically,
> > > > > or we can provide a WritableComparatorFactory<T> which has the
> ability
> > > > > to make values of type T from a byte [], as well as providing a
> > > > > Comparator<T>. (Better than providing a Comparator<byte[]> as it
> only
> > > > > costs once per row insert for deserialization, rather that twice on
> > > > > each comparison).
> > > > >
> > > > >
> > > > >
> > > >
> > > > I don't follow what the Factory adds.
> > > >
> > > > We're talking about getting HBASE-82 into 0.2.  Does that interfere
> with
> > > >
> > > this proposal?  (I'm thinking that along w/ rows becoming byte arrays
> rather
> > > Text with a client-supplied Comparator, column qualifiers would shift to
> be
> > > byte arrays also -- though yeah, implies that if your sort is not
> > > byte-lexicographical, yes, the compares can be costly involving two
> > > deserializations per compare).
> > >
> > > >
> > > >
> > > > > - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> > > > > which keeps the column values in order, and maps them back to row
> > > > > keys. First cut may just keep all this in memory, but it should be
> > > > > backed with MapFile(s).
> > > > >
> > > > >
> > > > >
> > > >
> > > > Would be sweet if you could leverage the HBase memcache code and
> flusher
> > > >
> > > to do the above.
> > >
> > > >
> > > > This Map would be global for the table?  Or per Region?
> > > >
> > > > A lucene index wouldn't work for you because you want ordering?
> > > >
> > > >
> > > >
> > > > > - Add to the hregion the ability to scan through keys in column
> order.
> > > > > Just iterate through the SortedMap, run a filter on the key, and if
> it
> > > > > passes do a get on the row.
> > > > >
> > > > >
> > > > >
> > > >
> > > > You'd be random reading rows.  You're OK w/ current performance?  (For
> > > >
> > > sure it will only improve but....).
> > >
> > > >
> > > >
> > > >
> > > > > - Add a ColumnOrderedClientScanner which will open column order
> > > > > scanners to all applicable hregions, and continuously pick row with
> > > > > the lowest column value from each of the client scanners.
> > > > >
> > > > >
> > > > >
> > > >
> > > > This scanner would have a significant client-side component to do the
> > > >
> > > arbitrage between all regions to figure the lowest column value?  If you
> had
> > > a new type of 'region' -- one denoted by lowest and upper column then
> the
> > > client-side logic would fade away and your scanner would look like
> current
> > > scanners.
> > >
> > > >
> > > >
> > > > > - Region splits should be easy enough, just a scan through the
> > > > > SortedMap to partition.
> > > > >
> > > > >
> > > > >
> > > >
> > > > Splits would not be row-based and run as they currently do, but rather
> > > >
> > > sorted-column based?
> > >
> > > >
> > > >
> > > >
> > > > > Of course, the index could also be used for more efficient querying
> on
> > > > > the indexed column's values.
> > > > >
> > > > > Do other users have a need for this functionality?
> > > > >
> > > > > What do developers think about this? I know hbase is more intended
> for
> > > > > back-end batch style processing, but we have this need.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > How are you thinking of adding in this new functionality?  Subclassing
> > > >
> > > HRegionServer?
> > >
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > Cheers,
> > > > > -clint
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>
>

Re: Secondary indexes

Posted by Bryan Duxbury <br...@rapleaf.com>.
If you just want the first 10 by a certain prefix ordered by a  
column, then you will definitely be better off scanning them by row  
and ordering them clientside.

Surely your idea of maintaining a separate SortedMap in each region  
would work, but I don't think you should discount the cost associated  
with having to talk to a bunch of different regions every time you  
want to know what the next row is. You'll do a lot of extra work to  
get that merged view of the index, and potentially the approach won't  
scale up to queries that might cover more than a "few" regions - can  
you imagine having to check 100 or 1000 regions for the next result  
every time you needed to iterate?

On Apr 22, 2008, at 12:58 PM, Clint Morgan wrote:

> Yeah, that would be an easy approach. We would need HBASE-82.
>
> The main problem I see here is that we cannot take (as much) advantage
> of our row key prefix in weeding out rows.
>
> Say I want the first 10 rows that start with XXX ordered by A:amount,
> then I would have scan through column values from rows everywhere in
> the table until I hit 10 with my prefix. Could be costly if table is
> large compared to the number of rows that start with XXX.
>
> Whereas if we have one SortedMap per Region, then I can quickly narrow
> down to (hopefully) a few regions based on key prefix.
>
> Though other usage / table loading patterns would surely benefit from
> this approach...
>
> On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <br...@rapleaf.com>  
> wrote:
>> This doesn't have to be all that complicated.
>>
>>  Why not keep another HBase table as the index? The keys would be  
>> the column
>> values. There'd be a single matchingRows column family, and the  
>> qualifiers
>> and values would be the rows that match that column. Then, when  
>> you want to
>> scan in column order instead of row order, you scan the index  
>> table, find
>> the list of rows that match each column, and then do a random read  
>> to grab
>> those individually. It'll for sure be slower than scanning a table  
>> ordered
>> by rows, but it'll get you what you want. It'll also handle the  
>> case where
>> the column values aren't unique.
>>
>>  If you need custom sorting for those values, then HBASE-82 would  
>> solve that
>> problem.
>>
>>  -Bryan
>>
>>
>>
>>  On Apr 22, 2008, at 11:58 AM, stack wrote:
>>
>>
>>> Some questions interlaced below:
>>>
>>> Clint Morgan wrote:
>>>
>>>> All,
>>>>
>>>> We want to put secondary indexes into hbase. The motivation is  
>>>> that we
>>>> are storing data in hbase that we want to serve to users. We would
>>>> like to be able to serve rows sorted by column values. Our queries
>>>> will be over rows with a given key prefix, so we should not be  
>>>> hitting
>>>> to many regions.
>>>>  I was thinking it would work roughly like this:
>>>>
>>>> - At table creation time, individual columns can be declared as
>>>> indexed. By default we could sort the column values  
>>>> lexicographically,
>>>> or we can provide a WritableComparatorFactory<T> which has the  
>>>> ability
>>>> to make values of type T from a byte [], as well as providing a
>>>> Comparator<T>. (Better than providing a Comparator<byte[]> as it  
>>>> only
>>>> costs once per row insert for deserialization, rather that twice on
>>>> each comparison).
>>>>
>>>>
>>>
>>> I don't follow what the Factory adds.
>>>
>>> We're talking about getting HBASE-82 into 0.2.  Does that  
>>> interfere with
>> this proposal?  (I'm thinking that along w/ rows becoming byte  
>> arrays rather
>> Text with a client-supplied Comparator, column qualifiers would  
>> shift to be
>> byte arrays also -- though yeah, implies that if your sort is not
>> byte-lexicographical, yes, the compares can be costly involving two
>> deserializations per compare).
>>>
>>>> - We catch all writes/deletes and maintain a SortedMap<T,  
>>>> HStoreKey>
>>>> which keeps the column values in order, and maps them back to row
>>>> keys. First cut may just keep all this in memory, but it should be
>>>> backed with MapFile(s).
>>>>
>>>>
>>>
>>> Would be sweet if you could leverage the HBase memcache code and  
>>> flusher
>> to do the above.
>>>
>>> This Map would be global for the table?  Or per Region?
>>>
>>> A lucene index wouldn't work for you because you want ordering?
>>>
>>>
>>>> - Add to the hregion the ability to scan through keys in column  
>>>> order.
>>>> Just iterate through the SortedMap, run a filter on the key, and  
>>>> if it
>>>> passes do a get on the row.
>>>>
>>>>
>>>
>>> You'd be random reading rows.  You're OK w/ current performance?   
>>> (For
>> sure it will only improve but....).
>>>
>>>
>>>> - Add a ColumnOrderedClientScanner which will open column order
>>>> scanners to all applicable hregions, and continuously pick row with
>>>> the lowest column value from each of the client scanners.
>>>>
>>>>
>>>
>>> This scanner would have a significant client-side component to do  
>>> the
>> arbitrage between all regions to figure the lowest column value?   
>> If you had
>> a new type of 'region' -- one denoted by lowest and upper column  
>> then the
>> client-side logic would fade away and your scanner would look like  
>> current
>> scanners.
>>>
>>>> - Region splits should be easy enough, just a scan through the
>>>> SortedMap to partition.
>>>>
>>>>
>>>
>>> Splits would not be row-based and run as they currently do, but  
>>> rather
>> sorted-column based?
>>>
>>>
>>>> Of course, the index could also be used for more efficient  
>>>> querying on
>>>> the indexed column's values.
>>>>
>>>> Do other users have a need for this functionality?
>>>>
>>>> What do developers think about this? I know hbase is more  
>>>> intended for
>>>> back-end batch style processing, but we have this need.
>>>>
>>>>
>>>>
>>> How are you thinking of adding in this new functionality?   
>>> Subclassing
>> HRegionServer?
>>>
>>> St.Ack
>>>
>>>
>>>> Cheers,
>>>> -clint
>>>>
>>>>
>>>
>>>
>>
>>


Re: Secondary indexes

Posted by Clint Morgan <cl...@gmail.com>.
Yeah, that would be an easy approach. We would need HBASE-82.

The main problem I see here is that we cannot take (as much) advantage
of our row key prefix in weeding out rows.

Say I want the first 10 rows that start with XXX ordered by A:amount,
then I would have scan through column values from rows everywhere in
the table until I hit 10 with my prefix. Could be costly if table is
large compared to the number of rows that start with XXX.

Whereas if we have one SortedMap per Region, then I can quickly narrow
down to (hopefully) a few regions based on key prefix.

Though other usage / table loading patterns would surely benefit from
this approach...

On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> This doesn't have to be all that complicated.
>
>  Why not keep another HBase table as the index? The keys would be the column
> values. There'd be a single matchingRows column family, and the qualifiers
> and values would be the rows that match that column. Then, when you want to
> scan in column order instead of row order, you scan the index table, find
> the list of rows that match each column, and then do a random read to grab
> those individually. It'll for sure be slower than scanning a table ordered
> by rows, but it'll get you what you want. It'll also handle the case where
> the column values aren't unique.
>
>  If you need custom sorting for those values, then HBASE-82 would solve that
> problem.
>
>  -Bryan
>
>
>
>  On Apr 22, 2008, at 11:58 AM, stack wrote:
>
>
> > Some questions interlaced below:
> >
> > Clint Morgan wrote:
> >
> > > All,
> > >
> > > We want to put secondary indexes into hbase. The motivation is that we
> > > are storing data in hbase that we want to serve to users. We would
> > > like to be able to serve rows sorted by column values. Our queries
> > > will be over rows with a given key prefix, so we should not be hitting
> > > to many regions.
> > >  I was thinking it would work roughly like this:
> > >
> > > - At table creation time, individual columns can be declared as
> > > indexed. By default we could sort the column values lexicographically,
> > > or we can provide a WritableComparatorFactory<T> which has the ability
> > > to make values of type T from a byte [], as well as providing a
> > > Comparator<T>. (Better than providing a Comparator<byte[]> as it only
> > > costs once per row insert for deserialization, rather that twice on
> > > each comparison).
> > >
> > >
> >
> > I don't follow what the Factory adds.
> >
> > We're talking about getting HBASE-82 into 0.2.  Does that interfere with
> this proposal?  (I'm thinking that along w/ rows becoming byte arrays rather
> Text with a client-supplied Comparator, column qualifiers would shift to be
> byte arrays also -- though yeah, implies that if your sort is not
> byte-lexicographical, yes, the compares can be costly involving two
> deserializations per compare).
> >
> > > - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> > > which keeps the column values in order, and maps them back to row
> > > keys. First cut may just keep all this in memory, but it should be
> > > backed with MapFile(s).
> > >
> > >
> >
> > Would be sweet if you could leverage the HBase memcache code and flusher
> to do the above.
> >
> > This Map would be global for the table?  Or per Region?
> >
> > A lucene index wouldn't work for you because you want ordering?
> >
> >
> > > - Add to the hregion the ability to scan through keys in column order.
> > > Just iterate through the SortedMap, run a filter on the key, and if it
> > > passes do a get on the row.
> > >
> > >
> >
> > You'd be random reading rows.  You're OK w/ current performance?  (For
> sure it will only improve but....).
> >
> >
> > > - Add a ColumnOrderedClientScanner which will open column order
> > > scanners to all applicable hregions, and continuously pick row with
> > > the lowest column value from each of the client scanners.
> > >
> > >
> >
> > This scanner would have a significant client-side component to do the
> arbitrage between all regions to figure the lowest column value?  If you had
> a new type of 'region' -- one denoted by lowest and upper column then the
> client-side logic would fade away and your scanner would look like current
> scanners.
> >
> > > - Region splits should be easy enough, just a scan through the
> > > SortedMap to partition.
> > >
> > >
> >
> > Splits would not be row-based and run as they currently do, but rather
> sorted-column based?
> >
> >
> > > Of course, the index could also be used for more efficient querying on
> > > the indexed column's values.
> > >
> > > Do other users have a need for this functionality?
> > >
> > > What do developers think about this? I know hbase is more intended for
> > > back-end batch style processing, but we have this need.
> > >
> > >
> > >
> > How are you thinking of adding in this new functionality?  Subclassing
> HRegionServer?
> >
> > St.Ack
> >
> >
> > > Cheers,
> > > -clint
> > >
> > >
> >
> >
>
>

Re: Secondary indexes

Posted by Bryan Duxbury <br...@rapleaf.com>.
This doesn't have to be all that complicated.

Why not keep another HBase table as the index? The keys would be the  
column values. There'd be a single matchingRows column family, and  
the qualifiers and values would be the rows that match that column.  
Then, when you want to scan in column order instead of row order, you  
scan the index table, find the list of rows that match each column,  
and then do a random read to grab those individually. It'll for sure  
be slower than scanning a table ordered by rows, but it'll get you  
what you want. It'll also handle the case where the column values  
aren't unique.

If you need custom sorting for those values, then HBASE-82 would  
solve that problem.

-Bryan

On Apr 22, 2008, at 11:58 AM, stack wrote:

> Some questions interlaced below:
>
> Clint Morgan wrote:
>> All,
>>
>> We want to put secondary indexes into hbase. The motivation is  
>> that we
>> are storing data in hbase that we want to serve to users. We would
>> like to be able to serve rows sorted by column values. Our queries
>> will be over rows with a given key prefix, so we should not be  
>> hitting
>> to many regions.
>>   I was thinking it would work roughly like this:
>>
>> - At table creation time, individual columns can be declared as
>> indexed. By default we could sort the column values  
>> lexicographically,
>> or we can provide a WritableComparatorFactory<T> which has the  
>> ability
>> to make values of type T from a byte [], as well as providing a
>> Comparator<T>. (Better than providing a Comparator<byte[]> as it only
>> costs once per row insert for deserialization, rather that twice on
>> each comparison).
>>
>
> I don't follow what the Factory adds.
>
> We're talking about getting HBASE-82 into 0.2.  Does that interfere  
> with this proposal?  (I'm thinking that along w/ rows becoming byte  
> arrays rather Text with a client-supplied Comparator, column  
> qualifiers would shift to be byte arrays also -- though yeah,  
> implies that if your sort is not byte-lexicographical, yes, the  
> compares can be costly involving two deserializations per compare).
>> - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
>> which keeps the column values in order, and maps them back to row
>> keys. First cut may just keep all this in memory, but it should be
>> backed with MapFile(s).
>>
>
> Would be sweet if you could leverage the HBase memcache code and  
> flusher to do the above.
>
> This Map would be global for the table?  Or per Region?
>
> A lucene index wouldn't work for you because you want ordering?
>
>> - Add to the hregion the ability to scan through keys in column  
>> order.
>> Just iterate through the SortedMap, run a filter on the key, and  
>> if it
>> passes do a get on the row.
>>
>
> You'd be random reading rows.  You're OK w/ current performance?   
> (For sure it will only improve but....).
>
>> - Add a ColumnOrderedClientScanner which will open column order
>> scanners to all applicable hregions, and continuously pick row with
>> the lowest column value from each of the client scanners.
>>
>
> This scanner would have a significant client-side component to do  
> the arbitrage between all regions to figure the lowest column  
> value?  If you had a new type of 'region' -- one denoted by lowest  
> and upper column then the client-side logic would fade away and  
> your scanner would look like current scanners.
>> - Region splits should be easy enough, just a scan through the
>> SortedMap to partition.
>>
>
> Splits would not be row-based and run as they currently do, but  
> rather sorted-column based?
>
>> Of course, the index could also be used for more efficient  
>> querying on
>> the indexed column's values.
>>
>> Do other users have a need for this functionality?
>>
>> What do developers think about this? I know hbase is more intended  
>> for
>> back-end batch style processing, but we have this need.
>>
>>
> How are you thinking of adding in this new functionality?   
> Subclassing HRegionServer?
>
> St.Ack
>
>> Cheers,
>> -clint
>>
>


Re: Secondary indexes

Posted by stack <st...@duboce.net>.
Some questions interlaced below:

Clint Morgan wrote:
> All,
>
> We want to put secondary indexes into hbase. The motivation is that we
> are storing data in hbase that we want to serve to users. We would
> like to be able to serve rows sorted by column values. Our queries
> will be over rows with a given key prefix, so we should not be hitting
> to many regions.
>   
> I was thinking it would work roughly like this:
>
> - At table creation time, individual columns can be declared as
> indexed. By default we could sort the column values lexicographically,
> or we can provide a WritableComparatorFactory<T> which has the ability
> to make values of type T from a byte [], as well as providing a
> Comparator<T>. (Better than providing a Comparator<byte[]> as it only
> costs once per row insert for deserialization, rather that twice on
> each comparison).
>   

I don't follow what the Factory adds.

We're talking about getting HBASE-82 into 0.2.  Does that interfere with 
this proposal?  (I'm thinking that along w/ rows becoming byte arrays 
rather Text with a client-supplied Comparator, column qualifiers would 
shift to be byte arrays also -- though yeah, implies that if your sort 
is not byte-lexicographical, yes, the compares can be costly involving 
two deserializations per compare).
> - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> which keeps the column values in order, and maps them back to row
> keys. First cut may just keep all this in memory, but it should be
> backed with MapFile(s).
>   

Would be sweet if you could leverage the HBase memcache code and flusher 
to do the above.

This Map would be global for the table?  Or per Region?

A lucene index wouldn't work for you because you want ordering?

> - Add to the hregion the ability to scan through keys in column order.
> Just iterate through the SortedMap, run a filter on the key, and if it
> passes do a get on the row.
>   

You'd be random reading rows.  You're OK w/ current performance?  (For 
sure it will only improve but....).

> - Add a ColumnOrderedClientScanner which will open column order
> scanners to all applicable hregions, and continuously pick row with
> the lowest column value from each of the client scanners.
>   

This scanner would have a significant client-side component to do the 
arbitrage between all regions to figure the lowest column value?  If you 
had a new type of 'region' -- one denoted by lowest and upper column 
then the client-side logic would fade away and your scanner would look 
like current scanners.
> - Region splits should be easy enough, just a scan through the
> SortedMap to partition.
>   

Splits would not be row-based and run as they currently do, but rather 
sorted-column based?

> Of course, the index could also be used for more efficient querying on
> the indexed column's values.
>
> Do other users have a need for this functionality?
>
> What do developers think about this? I know hbase is more intended for
> back-end batch style processing, but we have this need.
>
>   
How are you thinking of adding in this new functionality?  Subclassing 
HRegionServer?

St.Ack

> Cheers,
> -clint
>   


Re: Secondary indexes

Posted by Jun Rao <ju...@almaden.ibm.com>.
The proposal sounds interesting.

Will the indexes be maintained in the same "transaction" of an update? So,
if an update to a row is successful, but index maintenance fails, would you
roll back the row update?

Are you also considering composite (multi-column) indexes like those used
in Google's App Engine?

We have been thinking about adding to HBase a Lucene-based text index,
which can be maintained asynchronously with the table. Are you interested
in text index too?

Jun

clint.a.m@gmail.com wrote on 04/22/2008 10:31:02 AM:

> All,
>
> We want to put secondary indexes into hbase. The motivation is that we
> are storing data in hbase that we want to serve to users. We would
> like to be able to serve rows sorted by column values. Our queries
> will be over rows with a given key prefix, so we should not be hitting
> to many regions.
>
> I was thinking it would work roughly like this:
>
> - At table creation time, individual columns can be declared as
> indexed. By default we could sort the column values lexicographically,
> or we can provide a WritableComparatorFactory<T> which has the ability
> to make values of type T from a byte [], as well as providing a
> Comparator<T>. (Better than providing a Comparator<byte[]> as it only
> costs once per row insert for deserialization, rather that twice on
> each comparison).
>
> - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey>
> which keeps the column values in order, and maps them back to row
> keys. First cut may just keep all this in memory, but it should be
> backed with MapFile(s).
>
> - Add to the hregion the ability to scan through keys in column order.
> Just iterate through the SortedMap, run a filter on the key, and if it
> passes do a get on the row.
>
> - Add a ColumnOrderedClientScanner which will open column order
> scanners to all applicable hregions, and continuously pick row with
> the lowest column value from each of the client scanners.
>
> - Region splits should be easy enough, just a scan through the
> SortedMap to partition.
>
> Of course, the index could also be used for more efficient querying on
> the indexed column's values.
>
> Do other users have a need for this functionality?
>
> What do developers think about this? I know hbase is more intended for
> back-end batch style processing, but we have this need.
>
> Cheers,
> -clint