You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Lars George <la...@gmail.com> on 2010/11/23 16:25:42 UTC

HRegion.RegionScanner.nextInternal()

Hi,

I am officially confused:

          byte [] nextRow;
          do {
            this.storeHeap.next(results, limit - results.size());
            if (limit > 0 && results.size() == limit) {
              if (this.filter != null && filter.hasFilterRow()) throw
new IncompatibleFilterException(
                  "Filter with filterRow(List<KeyValue>) incompatible
with scan with limit!");
              return true; // we are expecting more yes, but also
limited to how many we can return.
            }
          } while (Bytes.equals(currentRow, nextRow = peekRow()));

This is from the nextInternal() call. Questions:

a) Why is that check for the filter and limit both being set inside the loop?

b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
would have thought) then what does that "limit - results.size()"
achieve?

I mean, this loops gets all columns for a given row, so batch/limit
should not be handled here, right? what if limit were set to "1" by
the client? Then even if the Get had 3 columns to retrieve it would
not be able to since this limit makes it bail out. So there would be
multiple calls to nextInternal() to complete what could be done in one
loop?

Eh?

Lars

Re: HRegion.RegionScanner.nextInternal()

Posted by Ryan Rawson <ry...@gmail.com>.
Yes in this case 'batch' and 'limit' refer to how many cells to return
at a time within a row.  The 'scanner caching' comes across in the
next(int) argument which can change on a per-call basis (although the
HTable API doesnt quite allow it).

-ryan

On Fri, Nov 26, 2010 at 3:12 AM, Lars George <la...@gmail.com> wrote:
> OK, got it. I missed the HRegionServers.next() in the mix. It calls
> the RegionScanner.next(results) and that uses the batch. Tricksy! I
> should have started on the client side instead.
>
> Lars
>
> On Fri, Nov 26, 2010 at 3:08 AM, Ryan Rawson <ry...@gmail.com> wrote:
>> No, batch size when limit is set is 1. You get partial results for a route,
>> then get more from the same row. Then the next row.
>> On Nov 25, 2010 4:54 PM, "Lars George" <la...@gmail.com> wrote:
>>> Mkay, I will look into it more for the latter. But for the limit this is
>> still confusing to me as limit == batch and that is in he client side the
>> number of rows. But not the number of columns. Does that mean if I had 100
>> columns and set batch to 10 that it would only return 10 rows with 10
>> columns but not what I would have expected ie. 10 rows with all columns? Is
>> this implicitly mean batch is also the intra row batch size?
>>>
>>> Lars
>>>
>>> On Nov 25, 2010, at 21:53, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>>> limit is for retrieving partial results of a row. Ie: give me a row
>>>> in chunks. Filters that want to operate on the entire row cannot be
>>>> used with this mode. i forget why it's in the loop but there was a
>>>> good reason at the time.
>>>>
>>>> -ryan
>>>>
>>>> On Thu, Nov 25, 2010 at 10:51 AM, Lars George <la...@gmail.com>
>> wrote:
>>>>> Does hbase-dev still get forwarded? Did you see the below message?
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Lars George <la...@gmail.com>
>>>>> Date: Tue, Nov 23, 2010 at 4:25 PM
>>>>> Subject: HRegion.RegionScanner.nextInternal()
>>>>> To: hbase-dev@hadoop.apache.org
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am officially confused:
>>>>>
>>>>> byte [] nextRow;
>>>>> do {
>>>>> this.storeHeap.next(results, limit - results.size());
>>>>> if (limit > 0 && results.size() == limit) {
>>>>> if (this.filter != null && filter.hasFilterRow()) throw
>>>>> new IncompatibleFilterException(
>>>>> "Filter with filterRow(List<KeyValue>) incompatible
>>>>> with scan with limit!");
>>>>> return true; // we are expecting more yes, but also
>>>>> limited to how many we can return.
>>>>> }
>>>>> } while (Bytes.equals(currentRow, nextRow = peekRow()));
>>>>>
>>>>> This is from the nextInternal() call. Questions:
>>>>>
>>>>> a) Why is that check for the filter and limit both being set inside the
>> loop?
>>>>>
>>>>> b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
>>>>> would have thought) then what does that "limit - results.size()"
>>>>> achieve?
>>>>>
>>>>> I mean, this loops gets all columns for a given row, so batch/limit
>>>>> should not be handled here, right? what if limit were set to "1" by
>>>>> the client? Then even if the Get had 3 columns to retrieve it would
>>>>> not be able to since this limit makes it bail out. So there would be
>>>>> multiple calls to nextInternal() to complete what could be done in one
>>>>> loop?
>>>>>
>>>>> Eh?
>>>>>
>>>>> Lars
>>>>>
>>
>

Re: HRegion.RegionScanner.nextInternal()

Posted by Lars George <la...@gmail.com>.
OK, got it. I missed the HRegionServers.next() in the mix. It calls
the RegionScanner.next(results) and that uses the batch. Tricksy! I
should have started on the client side instead.

Lars

On Fri, Nov 26, 2010 at 3:08 AM, Ryan Rawson <ry...@gmail.com> wrote:
> No, batch size when limit is set is 1. You get partial results for a route,
> then get more from the same row. Then the next row.
> On Nov 25, 2010 4:54 PM, "Lars George" <la...@gmail.com> wrote:
>> Mkay, I will look into it more for the latter. But for the limit this is
> still confusing to me as limit == batch and that is in he client side the
> number of rows. But not the number of columns. Does that mean if I had 100
> columns and set batch to 10 that it would only return 10 rows with 10
> columns but not what I would have expected ie. 10 rows with all columns? Is
> this implicitly mean batch is also the intra row batch size?
>>
>> Lars
>>
>> On Nov 25, 2010, at 21:53, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>> limit is for retrieving partial results of a row. Ie: give me a row
>>> in chunks. Filters that want to operate on the entire row cannot be
>>> used with this mode. i forget why it's in the loop but there was a
>>> good reason at the time.
>>>
>>> -ryan
>>>
>>> On Thu, Nov 25, 2010 at 10:51 AM, Lars George <la...@gmail.com>
> wrote:
>>>> Does hbase-dev still get forwarded? Did you see the below message?
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Lars George <la...@gmail.com>
>>>> Date: Tue, Nov 23, 2010 at 4:25 PM
>>>> Subject: HRegion.RegionScanner.nextInternal()
>>>> To: hbase-dev@hadoop.apache.org
>>>>
>>>> Hi,
>>>>
>>>> I am officially confused:
>>>>
>>>> byte [] nextRow;
>>>> do {
>>>> this.storeHeap.next(results, limit - results.size());
>>>> if (limit > 0 && results.size() == limit) {
>>>> if (this.filter != null && filter.hasFilterRow()) throw
>>>> new IncompatibleFilterException(
>>>> "Filter with filterRow(List<KeyValue>) incompatible
>>>> with scan with limit!");
>>>> return true; // we are expecting more yes, but also
>>>> limited to how many we can return.
>>>> }
>>>> } while (Bytes.equals(currentRow, nextRow = peekRow()));
>>>>
>>>> This is from the nextInternal() call. Questions:
>>>>
>>>> a) Why is that check for the filter and limit both being set inside the
> loop?
>>>>
>>>> b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
>>>> would have thought) then what does that "limit - results.size()"
>>>> achieve?
>>>>
>>>> I mean, this loops gets all columns for a given row, so batch/limit
>>>> should not be handled here, right? what if limit were set to "1" by
>>>> the client? Then even if the Get had 3 columns to retrieve it would
>>>> not be able to since this limit makes it bail out. So there would be
>>>> multiple calls to nextInternal() to complete what could be done in one
>>>> loop?
>>>>
>>>> Eh?
>>>>
>>>> Lars
>>>>
>

Re: HRegion.RegionScanner.nextInternal()

Posted by Ryan Rawson <ry...@gmail.com>.
No, batch size when limit is set is 1. You get partial results for a route,
then get more from the same row. Then the next row.
On Nov 25, 2010 4:54 PM, "Lars George" <la...@gmail.com> wrote:
> Mkay, I will look into it more for the latter. But for the limit this is
still confusing to me as limit == batch and that is in he client side the
number of rows. But not the number of columns. Does that mean if I had 100
columns and set batch to 10 that it would only return 10 rows with 10
columns but not what I would have expected ie. 10 rows with all columns? Is
this implicitly mean batch is also the intra row batch size?
>
> Lars
>
> On Nov 25, 2010, at 21:53, Ryan Rawson <ry...@gmail.com> wrote:
>
>> limit is for retrieving partial results of a row. Ie: give me a row
>> in chunks. Filters that want to operate on the entire row cannot be
>> used with this mode. i forget why it's in the loop but there was a
>> good reason at the time.
>>
>> -ryan
>>
>> On Thu, Nov 25, 2010 at 10:51 AM, Lars George <la...@gmail.com>
wrote:
>>> Does hbase-dev still get forwarded? Did you see the below message?
>>>
>>> ---------- Forwarded message ----------
>>> From: Lars George <la...@gmail.com>
>>> Date: Tue, Nov 23, 2010 at 4:25 PM
>>> Subject: HRegion.RegionScanner.nextInternal()
>>> To: hbase-dev@hadoop.apache.org
>>>
>>> Hi,
>>>
>>> I am officially confused:
>>>
>>> byte [] nextRow;
>>> do {
>>> this.storeHeap.next(results, limit - results.size());
>>> if (limit > 0 && results.size() == limit) {
>>> if (this.filter != null && filter.hasFilterRow()) throw
>>> new IncompatibleFilterException(
>>> "Filter with filterRow(List<KeyValue>) incompatible
>>> with scan with limit!");
>>> return true; // we are expecting more yes, but also
>>> limited to how many we can return.
>>> }
>>> } while (Bytes.equals(currentRow, nextRow = peekRow()));
>>>
>>> This is from the nextInternal() call. Questions:
>>>
>>> a) Why is that check for the filter and limit both being set inside the
loop?
>>>
>>> b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
>>> would have thought) then what does that "limit - results.size()"
>>> achieve?
>>>
>>> I mean, this loops gets all columns for a given row, so batch/limit
>>> should not be handled here, right? what if limit were set to "1" by
>>> the client? Then even if the Get had 3 columns to retrieve it would
>>> not be able to since this limit makes it bail out. So there would be
>>> multiple calls to nextInternal() to complete what could be done in one
>>> loop?
>>>
>>> Eh?
>>>
>>> Lars
>>>

Re: HRegion.RegionScanner.nextInternal()

Posted by Lars George <la...@gmail.com>.
Mkay, I will look into it more for the latter. But for the limit this is still confusing to me as limit == batch and that is in he client side the number of rows. But not the number of columns. Does that mean if I had 100 columns and set batch to 10 that it would only return 10 rows with 10 columns but not what I would have expected ie. 10 rows with all columns? Is this implicitly mean batch is also the intra row batch size? 

Lars

On Nov 25, 2010, at 21:53, Ryan Rawson <ry...@gmail.com> wrote:

> limit is for retrieving partial results of a row.  Ie: give me a row
> in chunks.  Filters that want to operate on the entire row cannot be
> used with this mode.  i forget why it's in the loop but there was a
> good reason at the time.
> 
> -ryan
> 
> On Thu, Nov 25, 2010 at 10:51 AM, Lars George <la...@gmail.com> wrote:
>> Does hbase-dev still get forwarded? Did you see the below message?
>> 
>> ---------- Forwarded message ----------
>> From: Lars George <la...@gmail.com>
>> Date: Tue, Nov 23, 2010 at 4:25 PM
>> Subject: HRegion.RegionScanner.nextInternal()
>> To: hbase-dev@hadoop.apache.org
>> 
>> Hi,
>> 
>> I am officially confused:
>> 
>>          byte [] nextRow;
>>          do {
>>            this.storeHeap.next(results, limit - results.size());
>>            if (limit > 0 && results.size() == limit) {
>>              if (this.filter != null && filter.hasFilterRow()) throw
>> new IncompatibleFilterException(
>>                  "Filter with filterRow(List<KeyValue>) incompatible
>> with scan with limit!");
>>              return true; // we are expecting more yes, but also
>> limited to how many we can return.
>>            }
>>          } while (Bytes.equals(currentRow, nextRow = peekRow()));
>> 
>> This is from the nextInternal() call. Questions:
>> 
>> a) Why is that check for the filter and limit both being set inside the loop?
>> 
>> b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
>> would have thought) then what does that "limit - results.size()"
>> achieve?
>> 
>> I mean, this loops gets all columns for a given row, so batch/limit
>> should not be handled here, right? what if limit were set to "1" by
>> the client? Then even if the Get had 3 columns to retrieve it would
>> not be able to since this limit makes it bail out. So there would be
>> multiple calls to nextInternal() to complete what could be done in one
>> loop?
>> 
>> Eh?
>> 
>> Lars
>> 

Re: HRegion.RegionScanner.nextInternal()

Posted by Ryan Rawson <ry...@gmail.com>.
limit is for retrieving partial results of a row.  Ie: give me a row
in chunks.  Filters that want to operate on the entire row cannot be
used with this mode.  i forget why it's in the loop but there was a
good reason at the time.

-ryan

On Thu, Nov 25, 2010 at 10:51 AM, Lars George <la...@gmail.com> wrote:
> Does hbase-dev still get forwarded? Did you see the below message?
>
> ---------- Forwarded message ----------
> From: Lars George <la...@gmail.com>
> Date: Tue, Nov 23, 2010 at 4:25 PM
> Subject: HRegion.RegionScanner.nextInternal()
> To: hbase-dev@hadoop.apache.org
>
> Hi,
>
> I am officially confused:
>
>          byte [] nextRow;
>          do {
>            this.storeHeap.next(results, limit - results.size());
>            if (limit > 0 && results.size() == limit) {
>              if (this.filter != null && filter.hasFilterRow()) throw
> new IncompatibleFilterException(
>                  "Filter with filterRow(List<KeyValue>) incompatible
> with scan with limit!");
>              return true; // we are expecting more yes, but also
> limited to how many we can return.
>            }
>          } while (Bytes.equals(currentRow, nextRow = peekRow()));
>
> This is from the nextInternal() call. Questions:
>
> a) Why is that check for the filter and limit both being set inside the loop?
>
> b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
> would have thought) then what does that "limit - results.size()"
> achieve?
>
> I mean, this loops gets all columns for a given row, so batch/limit
> should not be handled here, right? what if limit were set to "1" by
> the client? Then even if the Get had 3 columns to retrieve it would
> not be able to since this limit makes it bail out. So there would be
> multiple calls to nextInternal() to complete what could be done in one
> loop?
>
> Eh?
>
> Lars
>

Fwd: HRegion.RegionScanner.nextInternal()

Posted by Lars George <la...@gmail.com>.
Does hbase-dev still get forwarded? Did you see the below message?

---------- Forwarded message ----------
From: Lars George <la...@gmail.com>
Date: Tue, Nov 23, 2010 at 4:25 PM
Subject: HRegion.RegionScanner.nextInternal()
To: hbase-dev@hadoop.apache.org

Hi,

I am officially confused:

         byte [] nextRow;
         do {
           this.storeHeap.next(results, limit - results.size());
           if (limit > 0 && results.size() == limit) {
             if (this.filter != null && filter.hasFilterRow()) throw
new IncompatibleFilterException(
                 "Filter with filterRow(List<KeyValue>) incompatible
with scan with limit!");
             return true; // we are expecting more yes, but also
limited to how many we can return.
           }
         } while (Bytes.equals(currentRow, nextRow = peekRow()));

This is from the nextInternal() call. Questions:

a) Why is that check for the filter and limit both being set inside the loop?

b) if "limit" is the batch size (which for a Get is "-1", not "1" as I
would have thought) then what does that "limit - results.size()"
achieve?

I mean, this loops gets all columns for a given row, so batch/limit
should not be handled here, right? what if limit were set to "1" by
the client? Then even if the Get had 3 columns to retrieve it would
not be able to since this limit makes it bail out. So there would be
multiple calls to nextInternal() to complete what could be done in one
loop?

Eh?

Lars