You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2013/05/16 22:56:01 UTC

Question about HFile seeking

Lets say I have the following in my table:

            col1
row1       v1          ------> HFile entry would be "row1,col1,ts1-->v1"
             ol1
row1c     v2          ------> HFile entry would be "row1c,ol1,ts1-->v2"

Now I issue a prefix scan asking row for row "row1c", how do we seek - do
we seek directly to row1c or would we seek to row1 first and then to row1c.
The reason being that the HFile keys are the same for both the keys. I
simply absorb one character from the column into the row.

Thanks
Varun

Re: Question about HFile seeking

Posted by Stack <st...@duboce.net>.
What is your query?

If scanning over rows of 100k, yeah, you will go through each row's content
unless you specify you are only interested in some subset of the rows.
 Then a 'skipping' facility will cut where we will use the index to skip
over unwanted content.

St.Ack



On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <va...@pinterest.com> wrote:

> Nothing, I am just curious...
>
> So, we will do a bunch of wasteful scanning - that's lets say row1 has col1
> - col100000 - basically 100K columns, we will scan all those key values
> even though we are going to discard them, is that correct ?
>
>
> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>
> > What you seeing Varun (or think you are seeing)?
> > St.Ack
> >
> >
> > On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
> > wrote:
> > >
> > >> Or do we use some kind of demarcator b/w rows and columns and
> timestamps
> > >> when building the HFile keys and the indices ?
> > >>
> > >
> > > No demarcation but in KeyValue, we keep row, column family name, column
> > > family qualifier, etc., lengths and offsets so the comparators on ly
> > > compare pertinent bytes.
> > >
> > > If you doing a prefix scan w/ row1c, we should be starting the scan at
> > > row1c, not row1 (or more correctly at the row that starts the block we
> > > believe has a row1c row in it...).
> > >
> > > St.Ack
> > >
> >
>

Re: Question about HFile seeking

Posted by Stack <st...@duboce.net>.
On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <va...@pinterest.com> wrote:

> Referring to your comment above again
>
> "If you doing a prefix scan w/ row1c, we should be starting the scan at
> row1c, not row1 (or more correctly at the row that starts the block we
> believe has a row1c row in it...)"
>
> I am trying to understand how you could seek right across to the block
> containing "row1c" using the HFile Index. If the index is just built on
> HFile keys and there is no demarcation b/w rows and col(s), you would hit
> the block for "row1,col1". After that you would either need a way to skip
> right across to "row1c" after you find that this is not the row you are
> looking for or you will have to simply keep scanning and discarding
> sequentially until you get "row1c". If you have to keep scanning and
> discarding, then that is probably suboptimal. But if there is a way to skip
> right across from "row1,col1" to "row1c", then thats great, though I wonder
> how that would be implemented.
>


There is no demarcation but the serialized row key in the index has the
lengths of each of the row key constituent parts embedded.

Here is where we write out an entry:

http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.html#292

The entry is a KeyValue instances.  See how we pass the key and value
component separately into here:

http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.html#325

The key part is serialized in a particular format.  It is described here:

http://hbase.apache.org/xref/org/apache/hadoop/hbase/KeyValue.html#58

(or see http://hbase.apache.org/book.html#keyvalue, where it is better
described... easier to read).

Notice how the serialization has row length, column family length,
qualifier length, etc.

While in KV, see how it has comparators that compare byte arrays that have
serialized keys in them first by row, then by column family, etc: i.e. by
each of the component parts in turn:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/KeyValue.html#2515

Here is where we save off the key if it is first in current block:

http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.html#353

It gets added to index when we figure we have a block full of data (search
checkBlockBoundary in HFileWriterV2).

So keys in index are full-on keys.

At seek time, we eventually get down to an index lookup.  See here in hfile
package:

http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.html#176

Notice how there is a comparator and a byte array w/ a key in it.  The
comparator we read from the file.  It was comparator used writing the file.
 If you wander through this seeking method ignoring its handling of index
tiers, you'll end up eventually in the binary search of the index using our
key-conscious comparator:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.html#482(Our
KeyComparator over in KeyValue implements RawComparator so the
RawComparator you see here is a base type -- we have different comparators
dependent on whether it user data or system table files... that is another
story).

Hope this gives you some insight on how we could skip to 'row1c' w/o going
through the content of 'row1' first.

You have a perf problem?  Maybe dump out content of a few hfiles that you
know you are having speed issues against.  It can be informative?

St.Ack

Re: Question about HFile seeking

Posted by lars hofhansl <la...@apache.org>.
We use the index blocks to find the right block (we have 64k blocks by default). Once we found the block we a linear search for the KV we're looking for.

In your example, you'd find the first block that contains a KV for row1c and then seek into that block until you found your KV. We cannot do binary search, since all KVs can have different sizes, so we have to decode their length one by one in order to find the next. In my test, this linear search has not shown as a performance concern.

If you scan along very wide rows and but you select only a few column we typically skipscan directly to the next row after the last interesting column was hit.
(And now looking at the code I see that we can probably optimize wide rows with seek hints - we know the next column, so we can seek directly to that one, rather than going considering every column).


-- Lars



----- Original Message -----
From: Varun Sharma <va...@pinterest.com>
To: user@hbase.apache.org; Michael Stack <st...@cloudera.com>
Cc: 
Sent: Thursday, May 16, 2013 3:26 PM
Subject: Re: Question about HFile seeking

Referring to your comment above again

"If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...)"

I am trying to understand how you could seek right across to the block
containing "row1c" using the HFile Index. If the index is just built on
HFile keys and there is no demarcation b/w rows and col(s), you would hit
the block for "row1,col1". After that you would either need a way to skip
right across to "row1c" after you find that this is not the row you are
looking for or you will have to simply keep scanning and discarding
sequentially until you get "row1c". If you have to keep scanning and
discarding, then that is probably suboptimal. But if there is a way to skip
right across from "row1,col1" to "row1c", then thats great, though I wonder
how that would be implemented.

Varun



On Thu, May 16, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:

> Sorry I may have misunderstood what you meant.
>
> When you look for "row1c" in the HFile index - is it going to also match
> for "row1,col1" or only match "row1c". It all depends how the index is
> organized, if its only on HFile keys, it could also match row1,col1 unless
> we use some demarcator b/w row1 and col1 in our HFile keys. So I am just
> wondering if we will totally skip touch row1,col1 in this case and jump
> straight to row1c or not. The other option is that we would actually hit
> row1,col1 since the prefix matches row1c when looking at the HFile key and
> then, we look at the length of the row to grab the real portion from the
> concatenated HFile key and discard all row1 entries.
>
> Does that make my query clearer ?
>
>
> On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Nothing, I am just curious...
>>
>> So, we will do a bunch of wasteful scanning - that's lets say row1 has
>> col1 - col100000 - basically 100K columns, we will scan all those key
>> values even though we are going to discard them, is that correct ?
>>
>>
>> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>>
>>> What you seeing Varun (or think you are seeing)?
>>> St.Ack
>>>
>>>
>>> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>>>
>>> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>> >
>>> >> Or do we use some kind of demarcator b/w rows and columns and
>>> timestamps
>>> >> when building the HFile keys and the indices ?
>>> >>
>>> >
>>> > No demarcation but in KeyValue, we keep row, column family name, column
>>> > family qualifier, etc., lengths and offsets so the comparators on ly
>>> > compare pertinent bytes.
>>> >
>>> > If you doing a prefix scan w/ row1c, we should be starting the scan at
>>> > row1c, not row1 (or more correctly at the row that starts the block we
>>> > believe has a row1c row in it...).
>>> >
>>> > St.Ack
>>> >
>>>
>>
>>
>


Re: Question about HFile seeking

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Generally we start with seeking on all the Hfiles corresponding to the
region and load the blocks that correspond to that row key specified in the
scan.

If row1 and row1c are in the same block then we may start with row1.  If
they are in different blocks then we will start with the block containing
row1c.

Also as the Prefixfilter is getting used here so once we have hit the first
row in the block we keep scanning till the filterRowKey() says we have
arrived at a row that matches the prefix.
One more thing it will do is once the prefix matches are over (this will
happen because they are lexographically sorted) we will ignore all other
keys greater than what prefixfilter needs.

Regards
Ram


On Fri, May 17, 2013 at 12:52 PM, Varun Sharma <va...@pinterest.com> wrote:

> Thanks Stack and Lars for the detailed answers - This question is not
> really motivated by performance problems...
>
> So the index indeed knows what part of the HFile key is the row and which
> part is the column qualifier. Thats what I needed to know. I initially
> thought it saw it as an opaque concatenated key (row+col.
> qualifier+timestamp) in which case, it would be difficult to run prefix
> scans since prefixes could potentially bleed across row and col.
>
> Varun
>
>
> On Thu, May 16, 2013 at 11:54 PM, Michael Stack <st...@cloudera.com>
> wrote:
>
> > On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <va...@pinterest.com>
> wrote:
> >
> >> Referring to your comment above again
> >>
> >> "If you doing a prefix scan w/ row1c, we should be starting the scan at
> >> row1c, not row1 (or more correctly at the row that starts the block we
> >> believe has a row1c row in it...)"
> >>
> >> I am trying to understand how you could seek right across to the block
> >> containing "row1c" using the HFile Index. If the index is just built on
> >> HFile keys and there is no demarcation b/w rows and col(s), you would
> hit
> >> the block for "row1,col1". After that you would either need a way to
> skip
> >> right across to "row1c" after you find that this is not the row you are
> >> looking for or you will have to simply keep scanning and discarding
> >> sequentially until you get "row1c". If you have to keep scanning and
> >> discarding, then that is probably suboptimal. But if there is a way to
> skip
> >> right across from "row1,col1" to "row1c", then thats great, though I
> wonder
> >> how that would be implemented.
> >>
> >> (ugh... meant to send the below at 5pm but see i didn't send it...
> > anyways... see mailing list.. hopefully helps)
> >
> >
> > The hfile index looks like an opaque byte array but it actually has a
> > strong format.  In KV we have comparators that will look at this byte
> array
> > and exploit the format to tease apart row from column from qualifier.
> >
> > I have to run just now.  Will give you better answer this evening up on
> > list.
> >
> > St.Ack
> >
>

Re: Question about HFile seeking

Posted by Varun Sharma <va...@pinterest.com>.
Thanks Stack and Lars for the detailed answers - This question is not
really motivated by performance problems...

So the index indeed knows what part of the HFile key is the row and which
part is the column qualifier. Thats what I needed to know. I initially
thought it saw it as an opaque concatenated key (row+col.
qualifier+timestamp) in which case, it would be difficult to run prefix
scans since prefixes could potentially bleed across row and col.

Varun


On Thu, May 16, 2013 at 11:54 PM, Michael Stack <st...@cloudera.com> wrote:

> On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Referring to your comment above again
>>
>> "If you doing a prefix scan w/ row1c, we should be starting the scan at
>> row1c, not row1 (or more correctly at the row that starts the block we
>> believe has a row1c row in it...)"
>>
>> I am trying to understand how you could seek right across to the block
>> containing "row1c" using the HFile Index. If the index is just built on
>> HFile keys and there is no demarcation b/w rows and col(s), you would hit
>> the block for "row1,col1". After that you would either need a way to skip
>> right across to "row1c" after you find that this is not the row you are
>> looking for or you will have to simply keep scanning and discarding
>> sequentially until you get "row1c". If you have to keep scanning and
>> discarding, then that is probably suboptimal. But if there is a way to skip
>> right across from "row1,col1" to "row1c", then thats great, though I wonder
>> how that would be implemented.
>>
>> (ugh... meant to send the below at 5pm but see i didn't send it...
> anyways... see mailing list.. hopefully helps)
>
>
> The hfile index looks like an opaque byte array but it actually has a
> strong format.  In KV we have comparators that will look at this byte array
> and exploit the format to tease apart row from column from qualifier.
>
> I have to run just now.  Will give you better answer this evening up on
> list.
>
> St.Ack
>

Re: Question about HFile seeking

Posted by Varun Sharma <va...@pinterest.com>.
Referring to your comment above again

"If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...)"

I am trying to understand how you could seek right across to the block
containing "row1c" using the HFile Index. If the index is just built on
HFile keys and there is no demarcation b/w rows and col(s), you would hit
the block for "row1,col1". After that you would either need a way to skip
right across to "row1c" after you find that this is not the row you are
looking for or you will have to simply keep scanning and discarding
sequentially until you get "row1c". If you have to keep scanning and
discarding, then that is probably suboptimal. But if there is a way to skip
right across from "row1,col1" to "row1c", then thats great, though I wonder
how that would be implemented.

Varun



On Thu, May 16, 2013 at 2:55 PM, Varun Sharma <va...@pinterest.com> wrote:

> Sorry I may have misunderstood what you meant.
>
> When you look for "row1c" in the HFile index - is it going to also match
> for "row1,col1" or only match "row1c". It all depends how the index is
> organized, if its only on HFile keys, it could also match row1,col1 unless
> we use some demarcator b/w row1 and col1 in our HFile keys. So I am just
> wondering if we will totally skip touch row1,col1 in this case and jump
> straight to row1c or not. The other option is that we would actually hit
> row1,col1 since the prefix matches row1c when looking at the HFile key and
> then, we look at the length of the row to grab the real portion from the
> concatenated HFile key and discard all row1 entries.
>
> Does that make my query clearer ?
>
>
> On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Nothing, I am just curious...
>>
>> So, we will do a bunch of wasteful scanning - that's lets say row1 has
>> col1 - col100000 - basically 100K columns, we will scan all those key
>> values even though we are going to discard them, is that correct ?
>>
>>
>> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>>
>>> What you seeing Varun (or think you are seeing)?
>>> St.Ack
>>>
>>>
>>> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>>>
>>> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
>>> wrote:
>>> >
>>> >> Or do we use some kind of demarcator b/w rows and columns and
>>> timestamps
>>> >> when building the HFile keys and the indices ?
>>> >>
>>> >
>>> > No demarcation but in KeyValue, we keep row, column family name, column
>>> > family qualifier, etc., lengths and offsets so the comparators on ly
>>> > compare pertinent bytes.
>>> >
>>> > If you doing a prefix scan w/ row1c, we should be starting the scan at
>>> > row1c, not row1 (or more correctly at the row that starts the block we
>>> > believe has a row1c row in it...).
>>> >
>>> > St.Ack
>>> >
>>>
>>
>>
>

Re: Question about HFile seeking

Posted by Varun Sharma <va...@pinterest.com>.
Sorry I may have misunderstood what you meant.

When you look for "row1c" in the HFile index - is it going to also match
for "row1,col1" or only match "row1c". It all depends how the index is
organized, if its only on HFile keys, it could also match row1,col1 unless
we use some demarcator b/w row1 and col1 in our HFile keys. So I am just
wondering if we will totally skip touch row1,col1 in this case and jump
straight to row1c or not. The other option is that we would actually hit
row1,col1 since the prefix matches row1c when looking at the HFile key and
then, we look at the length of the row to grab the real portion from the
concatenated HFile key and discard all row1 entries.

Does that make my query clearer ?


On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <va...@pinterest.com> wrote:

> Nothing, I am just curious...
>
> So, we will do a bunch of wasteful scanning - that's lets say row1 has
> col1 - col100000 - basically 100K columns, we will scan all those key
> values even though we are going to discard them, is that correct ?
>
>
> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>
>> What you seeing Varun (or think you are seeing)?
>> St.Ack
>>
>>
>> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>>
>> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>> >
>> >> Or do we use some kind of demarcator b/w rows and columns and
>> timestamps
>> >> when building the HFile keys and the indices ?
>> >>
>> >
>> > No demarcation but in KeyValue, we keep row, column family name, column
>> > family qualifier, etc., lengths and offsets so the comparators on ly
>> > compare pertinent bytes.
>> >
>> > If you doing a prefix scan w/ row1c, we should be starting the scan at
>> > row1c, not row1 (or more correctly at the row that starts the block we
>> > believe has a row1c row in it...).
>> >
>> > St.Ack
>> >
>>
>
>

Re: Question about HFile seeking

Posted by Varun Sharma <va...@pinterest.com>.
Nothing, I am just curious...

So, we will do a bunch of wasteful scanning - that's lets say row1 has col1
- col100000 - basically 100K columns, we will scan all those key values
even though we are going to discard them, is that correct ?


On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:

> What you seeing Varun (or think you are seeing)?
> St.Ack
>
>
> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>
> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
> wrote:
> >
> >> Or do we use some kind of demarcator b/w rows and columns and timestamps
> >> when building the HFile keys and the indices ?
> >>
> >
> > No demarcation but in KeyValue, we keep row, column family name, column
> > family qualifier, etc., lengths and offsets so the comparators on ly
> > compare pertinent bytes.
> >
> > If you doing a prefix scan w/ row1c, we should be starting the scan at
> > row1c, not row1 (or more correctly at the row that starts the block we
> > believe has a row1c row in it...).
> >
> > St.Ack
> >
>

Re: Question about HFile seeking

Posted by Stack <st...@duboce.net>.
What you seeing Varun (or think you are seeing)?
St.Ack


On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:

> On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Or do we use some kind of demarcator b/w rows and columns and timestamps
>> when building the HFile keys and the indices ?
>>
>
> No demarcation but in KeyValue, we keep row, column family name, column
> family qualifier, etc., lengths and offsets so the comparators on ly
> compare pertinent bytes.
>
> If you doing a prefix scan w/ row1c, we should be starting the scan at
> row1c, not row1 (or more correctly at the row that starts the block we
> believe has a row1c row in it...).
>
> St.Ack
>

Re: Question about HFile seeking

Posted by Stack <st...@duboce.net>.
On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com> wrote:

> Or do we use some kind of demarcator b/w rows and columns and timestamps
> when building the HFile keys and the indices ?
>

No demarcation but in KeyValue, we keep row, column family name, column
family qualifier, etc., lengths and offsets so the comparators on ly
compare pertinent bytes.

If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...).

St.Ack

Re: Question about HFile seeking

Posted by Varun Sharma <va...@pinterest.com>.
Or do we use some kind of demarcator b/w rows and columns and timestamps
when building the HFile keys and the indices ?

Thanks
Varun


On Thu, May 16, 2013 at 1:56 PM, Varun Sharma <va...@pinterest.com> wrote:

> Lets say I have the following in my table:
>
>             col1
> row1       v1          ------> HFile entry would be "row1,col1,ts1-->v1"
>              ol1
> row1c     v2          ------> HFile entry would be "row1c,ol1,ts1-->v2"
>
> Now I issue a prefix scan asking row for row "row1c", how do we seek - do
> we seek directly to row1c or would we seek to row1 first and then to row1c.
> The reason being that the HFile keys are the same for both the keys. I
> simply absorb one character from the column into the row.
>
> Thanks
> Varun
>