You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Whitney Sorenson <ws...@hubspot.com> on 2012/07/17 17:11:40 UTC

Scan only talks to a single region server

I'm trying to scan across an entire table (using only a specific
family or family + qualifier).

I've tried various methods but I can only get this scan to touch the
first region server. Afterwords, it stops processing. Issuing the same
scan in the shell works (returns 50,000 rows) whereas the Scan made
from Java only returns ~4000 rows.

I've tried adding/removing start/stop rows, using getScanner(family,
column) vs getScanner(scan), and restarting the region servers which
host the 1st and 2nd regions.

The debug output from the scan shows that it knows about locations for
each region; however, it calls close after the first region.

In the simplest case, the code looks like:

ResultScanner rs = table.getScanner(family, qualifier);
for (Result r : rs) {
// do something
}

Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
inside a map task)

I figure the next step is to walk through the client scanner code
locally in a java main but haven't done this yet.

Re: Scan only talks to a single region server

Posted by Jimmy Xiang <jx...@cloudera.com>.
Hi Whitney,

The scanner will automatically jump to the next region server once the
current region server is scanned.

In the client, can HTable.getStartEndKeys() see all the regions and
region servers?

Thanks,
Jimmy

On Tue, Jul 17, 2012 at 10:47 AM, Whitney Sorenson
<ws...@hubspot.com> wrote:
> The code is pasted above, here it is again:
>
> ResultScanner rs = table.getScanner(family, qualifier);
> for (Result r : rs) {
> // do something
> }
>
> ResultScanner's are iterable which means you can for:each them. In
> addition, the debug logs indicate that the scanner only ever retrieves
> rows from the first region server.
>
> On Tue, Jul 17, 2012 at 12:02 PM, Alex Baranau <al...@gmail.com> wrote:
>>> How do you create your scan(ner)? Could you paste the code here?
>>
>> Sorry, meant to ask how do you instantiate HTable, configuration objects.
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>> On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau <al...@gmail.com>wrote:
>>
>>> > this scan is running
>>> > inside a map task
>>>
>>> How do you create your scan(ner)? Could you paste the code here?
>>>
>>> You know that when HBase table is used as a source for MapReduce job (via
>>> standard configuration), each Map task consumes data from one region (apart
>>> from other things, it tries to benefit from data locality). I.e. it creates
>>> one Map task per region. I wonder if this can be related.
>>>
>>> Sorry for obvious check...
>>>
>>> Alex Baranau
>>> ------
>>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>>> Solr
>>>
>>> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>>
>>>> I'm trying to scan across an entire table (using only a specific
>>>> family or family + qualifier).
>>>>
>>>> I've tried various methods but I can only get this scan to touch the
>>>> first region server. Afterwords, it stops processing. Issuing the same
>>>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>>>> from Java only returns ~4000 rows.
>>>>
>>>> I've tried adding/removing start/stop rows, using getScanner(family,
>>>> column) vs getScanner(scan), and restarting the region servers which
>>>> host the 1st and 2nd regions.
>>>>
>>>> The debug output from the scan shows that it knows about locations for
>>>> each region; however, it calls close after the first region.
>>>>
>>>> In the simplest case, the code looks like:
>>>>
>>>> ResultScanner rs = table.getScanner(family, qualifier);
>>>> for (Result r : rs) {
>>>> // do something
>>>> }
>>>>
>>>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>>>> inside a map task)
>>>>
>>>> I figure the next step is to walk through the client scanner code
>>>> locally in a java main but haven't done this yet.
>>>>
>>>
>>>
>>>
>>> --
>>> Alex Baranau
>>> ------
>>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>>> Solr
>>>
>>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr

Re: Scan only talks to a single region server

Posted by Whitney Sorenson <ws...@hubspot.com>.
The code is pasted above, here it is again:

ResultScanner rs = table.getScanner(family, qualifier);
for (Result r : rs) {
// do something
}

ResultScanner's are iterable which means you can for:each them. In
addition, the debug logs indicate that the scanner only ever retrieves
rows from the first region server.

On Tue, Jul 17, 2012 at 12:02 PM, Alex Baranau <al...@gmail.com> wrote:
>> How do you create your scan(ner)? Could you paste the code here?
>
> Sorry, meant to ask how do you instantiate HTable, configuration objects.
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau <al...@gmail.com>wrote:
>
>> > this scan is running
>> > inside a map task
>>
>> How do you create your scan(ner)? Could you paste the code here?
>>
>> You know that when HBase table is used as a source for MapReduce job (via
>> standard configuration), each Map task consumes data from one region (apart
>> from other things, it tries to benefit from data locality). I.e. it creates
>> one Map task per region. I wonder if this can be related.
>>
>> Sorry for obvious check...
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> I'm trying to scan across an entire table (using only a specific
>>> family or family + qualifier).
>>>
>>> I've tried various methods but I can only get this scan to touch the
>>> first region server. Afterwords, it stops processing. Issuing the same
>>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>>> from Java only returns ~4000 rows.
>>>
>>> I've tried adding/removing start/stop rows, using getScanner(family,
>>> column) vs getScanner(scan), and restarting the region servers which
>>> host the 1st and 2nd regions.
>>>
>>> The debug output from the scan shows that it knows about locations for
>>> each region; however, it calls close after the first region.
>>>
>>> In the simplest case, the code looks like:
>>>
>>> ResultScanner rs = table.getScanner(family, qualifier);
>>> for (Result r : rs) {
>>> // do something
>>> }
>>>
>>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>>> inside a map task)
>>>
>>> I figure the next step is to walk through the client scanner code
>>> locally in a java main but haven't done this yet.
>>>
>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr

Re: Scan only talks to a single region server

Posted by Alex Baranau <al...@gmail.com>.
> How do you create your scan(ner)? Could you paste the code here?

Sorry, meant to ask how do you instantiate HTable, configuration objects.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau <al...@gmail.com>wrote:

> > this scan is running
> > inside a map task
>
> How do you create your scan(ner)? Could you paste the code here?
>
> You know that when HBase table is used as a source for MapReduce job (via
> standard configuration), each Map task consumes data from one region (apart
> from other things, it tries to benefit from data locality). I.e. it creates
> one Map task per region. I wonder if this can be related.
>
> Sorry for obvious check...
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> I'm trying to scan across an entire table (using only a specific
>> family or family + qualifier).
>>
>> I've tried various methods but I can only get this scan to touch the
>> first region server. Afterwords, it stops processing. Issuing the same
>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>> from Java only returns ~4000 rows.
>>
>> I've tried adding/removing start/stop rows, using getScanner(family,
>> column) vs getScanner(scan), and restarting the region servers which
>> host the 1st and 2nd regions.
>>
>> The debug output from the scan shows that it knows about locations for
>> each region; however, it calls close after the first region.
>>
>> In the simplest case, the code looks like:
>>
>> ResultScanner rs = table.getScanner(family, qualifier);
>> for (Result r : rs) {
>> // do something
>> }
>>
>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>> inside a map task)
>>
>> I figure the next step is to walk through the client scanner code
>> locally in a java main but haven't done this yet.
>>
>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
>


-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Scan only talks to a single region server

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi,

I'm not 100% sure but I think getScanner return a result scanner and
not the result itself.

What you need to do is something like


  		ResultScanner scanner = table_work_proposed.getScanner(scan);
			Result[] results = scanner.next(linesToRead);
			while (results.length > 0)
			{
				for (Result result : results)
				{
// Do something or nothing
					byte[] row = result.getRow();
				}
				results = scanner.next(linesToRead);
			}

On your example I think you are counting the results scanners. Not the rows.

JM

2012/7/17, Alex Baranau <al...@gmail.com>:
>> this scan is running
>> inside a map task
>
> How do you create your scan(ner)? Could you paste the code here?
>
> You know that when HBase table is used as a source for MapReduce job (via
> standard configuration), each Map task consumes data from one region (apart
> from other things, it tries to benefit from data locality). I.e. it creates
> one Map task per region. I wonder if this can be related.
>
> Sorry for obvious check...
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson
> <ws...@hubspot.com>wrote:
>
>> I'm trying to scan across an entire table (using only a specific
>> family or family + qualifier).
>>
>> I've tried various methods but I can only get this scan to touch the
>> first region server. Afterwords, it stops processing. Issuing the same
>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>> from Java only returns ~4000 rows.
>>
>> I've tried adding/removing start/stop rows, using getScanner(family,
>> column) vs getScanner(scan), and restarting the region servers which
>> host the 1st and 2nd regions.
>>
>> The debug output from the scan shows that it knows about locations for
>> each region; however, it calls close after the first region.
>>
>> In the simplest case, the code looks like:
>>
>> ResultScanner rs = table.getScanner(family, qualifier);
>> for (Result r : rs) {
>> // do something
>> }
>>
>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>> inside a map task)
>>
>> I figure the next step is to walk through the client scanner code
>> locally in a java main but haven't done this yet.
>>
>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>

Re: Scan only talks to a single region server

Posted by Alex Baranau <al...@gmail.com>.
> this scan is running
> inside a map task

How do you create your scan(ner)? Could you paste the code here?

You know that when HBase table is used as a source for MapReduce job (via
standard configuration), each Map task consumes data from one region (apart
from other things, it tries to benefit from data locality). I.e. it creates
one Map task per region. I wonder if this can be related.

Sorry for obvious check...

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> I'm trying to scan across an entire table (using only a specific
> family or family + qualifier).
>
> I've tried various methods but I can only get this scan to touch the
> first region server. Afterwords, it stops processing. Issuing the same
> scan in the shell works (returns 50,000 rows) whereas the Scan made
> from Java only returns ~4000 rows.
>
> I've tried adding/removing start/stop rows, using getScanner(family,
> column) vs getScanner(scan), and restarting the region servers which
> host the 1st and 2nd regions.
>
> The debug output from the scan shows that it knows about locations for
> each region; however, it calls close after the first region.
>
> In the simplest case, the code looks like:
>
> ResultScanner rs = table.getScanner(family, qualifier);
> for (Result r : rs) {
> // do something
> }
>
> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
> inside a map task)
>
> I figure the next step is to walk through the client scanner code
> locally in a java main but haven't done this yet.
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr