You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by cu...@g.pl on 2008/05/13 23:00:07 UTC

long time scanner.iterator().hasNext()

 Hi
 I use Hadoop 0.17 and HBase 0.2.0 dev.
 i noticed a strange behavior of hbase when i create scanner and call
scanner.iterator().hasNext() - it needs very long time for answer.
 It's realy strange ... this happens only for one table. The same test on
other tables gives that very short time is needed for
scanner.iterator().hasNext().

  let's see :

       Table          : index
       number of rows : about 65000
       scanner from startId and with the PageRowFilter : limit 10

    creating HTable : 183 ms
    creating filter : 1 ms
    creating scanner: 13 ms

    Iterator iterator = scanner.iterator() : 0 ms
    iterator.hasNext(): 2843 ms
    iterator.next(); 1 ms;

for other table : data
Table          : index
    number of rows : about 65000
    scanner from startId and with the PageRowFilter : limit 10

    creating HTable : 213 ms
    creating filter : 1 ms
    creating scanner: 13 ms

    Iterator iterator = scanner.iterator() : 1 ms
    iterator.hasNext(): 8 ms
    iterator.next(); 1 ms;


There is only one difference between table index and data - format of rowId

 example rowId for table "data" :

    5487987413120499077,4767446346789789789da13dfga-d524-gmlw-354m-hjkjwer42abr

 example rowId for table "index" :

     ennm,Dvboemepw,userssearch,a,9223372036854775807,9223372036854775807ffffffff-ffff-ffff-ffff-ffffffffffff


Please help me - where is a problem? why is the checking next element so
long - for index table?
is the problem with rowId's format ?

 Thanx - Acure

RE: long time scanner.iterator().hasNext()

Posted by Jim Kellerman <ji...@powerset.com>.

There is a bug in scanners currently that will be addressed in HBASE-538 which in some circumstances the scanner is not closed and advanced properly. This may be the problem you are seeing. Is the table with the long hasNext time contained in a single region?

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: cure@g.pl [mailto:cure@g.pl]
> Sent: Tuesday, May 13, 2008 11:44 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: long time scanner.iterator().hasNext()
>
>
> > Any chance of your figuring where the time is being spent?  Do you
> > have the facility to add dumb logging, recompile, restart
> cluster, and then
> > check?   If you want me to send you a patch to get you
> going, just say so.
>
>  i have some hypothesis, but it's only my speculation :
>
>    i think that, when i call scanner.iterator().hasNext() it
> has to check all rows in the table to find (or not) next
> rowId (because hasnext time rise in line correlation with a
> table size). Maby problem is in sorting rowId in the table?
>
>    i will try to change rowId format.
>
>    i try checkout latest version of hbase and try to check
> the same case today.
>
>    Acure.
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.16/1432 - Release
> Date: 5/14/2008 7:49 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.16/1432 - Release Date: 5/14/2008 7:49 AM

Re: long time scanner.iterator().hasNext()

Posted by cu...@g.pl.

Hi

>>  i have some hypothesis, but it's only my speculation :
>>
>>    i think that, when i call scanner.iterator().hasNext() it has to check
>> all rows in the table to find (or not) next rowId (because hasnext time
>> rise in line correlation with a table size). Maby problem is in sorting
>> rowId in the table?
>>
> Scanners march step through the memcache and files in the filesystem. 
When you call hasNext, open iterators are moved to the next row.  There
should not be full region/table scanning going on per hasNext call.
>

   yes i see, but i checked it and time is longer when i have more rows in
table. there is linear dependence between number of rows and query
time.

   i can try to make sample application, if you want.

> Are you using filters per chance?

    yes - but i try to use different filters : PageRowFilter,
StopRowFilter, and my own : PageStopRowFilter - it doesn't change 
hasNext time.

> Thanks for doing this investigation.  Which version are you currently
on?  0.1.2?
>
    i use 0.2.0 dev from trunk ( but not realy fresh version - it has a
two or 3 weeks old)

  Antony

Re: long time scanner.iterator().hasNext()

Posted by stack <st...@duboce.net>.

cure@g.pl wrote:
>> Any chance of your figuring where the time is being spent?  Do you have
>> the facility to add dumb logging, recompile, restart cluster, and then
>> check?   If you want me to send you a patch to get you going, just say so.
>>     
>
>  i have some hypothesis, but it's only my speculation :
>
>    i think that, when i call scanner.iterator().hasNext() it has to check
> all rows in the table to find (or not) next rowId (because hasnext time
> rise in line correlation with a table size). Maby problem is in sorting
> rowId in the table?
>   
Scanners march step through the memcache and files in the filesystem.  
When you call hasNext, open iterators are moved to the next row.  There 
should not be full region/table scanning going on per hasNext call.

Are you using filters per chance?

>    i will try to change rowId format.
>
>    i try checkout latest version of hbase and try to check the same case
> today.
>   

Thanks for doing this investigation.  Which version are you currently 
on?  0.1.2?

St.Ack

Re: long time scanner.iterator().hasNext()

Posted by cu...@g.pl.

> Any chance of your figuring where the time is being spent?  Do you have
> the facility to add dumb logging, recompile, restart cluster, and then
> check?   If you want me to send you a patch to get you going, just say so.

 i have some hypothesis, but it's only my speculation :

   i think that, when i call scanner.iterator().hasNext() it has to check
all rows in the table to find (or not) next rowId (because hasnext time
rise in line correlation with a table size). Maby problem is in sorting
rowId in the table?

   i will try to change rowId format.

   i try checkout latest version of hbase and try to check the same case
today.

   Acure.

Re: long time scanner.iterator().hasNext()

Posted by stack <st...@duboce.net>.

Any chance of your figuring where the time is being spent?  Do you have 
the facility to add dumb logging, recompile, restart cluster, and then 
check?   If you want me to send you a patch to get you going, just say so.
St.Ack

cure@g.pl wrote:
>> cure@g.pl wrote:
>>     
>
>   
>> Both have same-sized cell values.
>>     
>
>     yes.
>
>   Acure
>
>

Re: long time scanner.iterator().hasNext()

Posted by cu...@g.pl.

> cure@g.pl wrote:

> Both have same-sized cell values.

    yes.

  Acure

Re: long time scanner.iterator().hasNext()

Posted by cu...@g.pl.

> cure@g.pl wrote:
>>  Hi
>>  I use Hadoop 0.17 and HBase 0.2.0 dev.
>>  i noticed a strange behavior of hbase when i create scanner and call
>> scanner.iterator().hasNext() - it needs very long time for answer.
>>
> Every time you call hasNext?

    yes. for this table.

> Its the table with the shorter keys that has the problem?

    no. i have very large table, and hasNext time is about 5 - 8 ms

> Both have same-sized cell values.

    yes.

Re: long time scanner.iterator().hasNext()

Posted by stack <st...@duboce.net>.

cure@g.pl wrote:
>  Hi
>  I use Hadoop 0.17 and HBase 0.2.0 dev.
>  i noticed a strange behavior of hbase when i create scanner and call
> scanner.iterator().hasNext() - it needs very long time for answer.
>   
Every time you call hasNext?

Its the table with the shorter keys that has the problem?

Both have same-sized cell values.

St.Ack

>  It's realy strange ... this happens only for one table. The same test on
> other tables gives that very short time is needed for
> scanner.iterator().hasNext().
>
>   let's see :
>
>        Table          : index
>        number of rows : about 65000
>        scanner from startId and with the PageRowFilter : limit 10
>
>     creating HTable : 183 ms
>     creating filter : 1 ms
>     creating scanner: 13 ms
>
>     Iterator iterator = scanner.iterator() : 0 ms
>     iterator.hasNext(): 2843 ms
>     iterator.next(); 1 ms;
>
> for other table : data
> Table          : index
>     number of rows : about 65000
>     scanner from startId and with the PageRowFilter : limit 10
>
>     creating HTable : 213 ms
>     creating filter : 1 ms
>     creating scanner: 13 ms
>
>     Iterator iterator = scanner.iterator() : 1 ms
>     iterator.hasNext(): 8 ms
>     iterator.next(); 1 ms;
>
>
> There is only one difference between table index and data - format of rowId
>
>  example rowId for table "data" :
>
>     5487987413120499077,4767446346789789789da13dfga-d524-gmlw-354m-hjkjwer42abr
>
>  example rowId for table "index" :
>
>      ennm,Dvboemepw,userssearch,a,9223372036854775807,9223372036854775807ffffffff-ffff-ffff-ffff-ffffffffffff
>
>
> Please help me - where is a problem? why is the checking next element so
> long - for index table?
> is the problem with rowId's format ?
>
>  Thanx - Acure
>
>