You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Arun Mishra <ar...@me.com> on 2015/06/06 21:54:04 UTC

Query on OutOfOrderScannerNextException

Hello,

I have a query on OutOfOrderScannerNextException. I am using hbase 0.98.6 with 45 nodes.

I have a mapreduce job which scan 1 table for last 1 day worth data using timerange. It has been running fine for months without any failure. But last couple of days it has been failing with below exception. I have traced the failure to a single region. This region has 1 store and 1 hfile of 5+GB. What we realized was that, we were writing some bulk data, which used to land on this region. After we stopped writing this data, this region has been receiving very few writes per day.

When mapreduce job runs, it creates a map task for this region and that task fails with OutOfOrderScannerNextException. I was able to reproduce this error by running a scan command with same start/stop row and timerange option. Finally, we split this region to be small enough for scan command to work.

My query is if there is any option, apart from increasing the timeout, which can solve this use case? I am thinking of a use case where data comes in for 3 days a week in bulk and then nothing for next 3 days. Kind of creating a data hole in region.
My understanding is that I am hit with this error because I have big store files and timerange scan is reading entire file even though it contains very few rowkeys for that timerange.

hbase.client.scanner.caching = 100
hbase.client.scanner.timeout.period = 60s

scan 'dummytable',{ STARTROW=>'dummyrowkey-start', STOPROW=>'dummyrowkey-end', LIMIT=>1000, TIMERANGE=>[1433462400000,1433548800000]}
ROW COLUMN+CELL

ERROR: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 33648 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:745)

Regards,
Arun

Re: Query on OutOfOrderScannerNextException

Posted by Anoop John <an...@gmail.com>.

The reason we throw this exception is as below

Yes looking for few rows from a big region. It takes time to fill the #rows
as requested by client side. By this time the client gets an rpc timeout.
So client side will retry the call on same scanner. Remember with this next
call client says give me next N rows from where you are. The old failed
call was in progress and would have advanced some rows. So this retry call
will miss those rows.... to avoid this and to distinguish this case we have
this scan seqno and this exception.  On seeing this  the cliebt will close
the scanner and create a new one with proper start row . But this retry way
happens only one more time.  Again this call also migt be timing out.  So
have to re all adjust the timeout and/or scan caching value.   Yes the
heart beat mechaniam avoids such timeout for long running scans.  Hope this
explanation helps.

Anoop


On Sunday, June 7, 2015, Arun Mishra <ar...@me.com> wrote:
> Thanks Vladimir. I am using option 2 as a short term fix for now. I will
definitely look into key design.
>
> Regards,
> Arun.
>
>> On Jun 6, 2015, at 3:18 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:
>>
>> The scanner fails at the very beginning. The reason is because they need
a
>> very few rows from a large file and HBase needs
>> to fill RPC buffer (which is 100 rows, yes?) before it can return first
>> batch. This takes more than 60 sec and scanner fails (do not ask me why
its
>> not the timeout exception)
>>
>> 1. HBASE-13090 will help (can be back ported I presume to 1.0 and 0.98.x)
>> 2. Smaller region size will help
>> 3. Smaller  hbase.client.scanner.caching will help
>> 4. Larger hbase.client.scanner.timeout.period will help
>> 5. Better data store design (rowkeys) is preferred.
>>
>> Too many options to choose from.
>>
>> -Vlad
>>
>>
>>> On Sat, Jun 6, 2015 at 3:04 PM, Arun Mishra <ar...@me.com> wrote:
>>>
>>> Thanks TED.
>>>
>>> Regards,
>>> Arun.
>>>
>>>> On Jun 6, 2015, at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>> HBASE-13090 'Progress heartbeats for long running scanners' solves the
>>>> problem you faced.
>>>>
>>>> It is in the 1.1.0 release.
>>>>
>>>> FYI
>>>>
>>>>> On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <ar...@me.com>
wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have a query on OutOfOrderScannerNextException. I am using hbase
>>> 0.98.6
>>>>> with 45 nodes.
>>>>>
>>>>> I have a mapreduce job which scan 1 table for last 1 day worth data
>>> using
>>>>> timerange. It has been running fine for months without any failure.
But
>>>>> last couple of days it has been failing with below exception. I have
>>> traced
>>>>> the failure to a single region. This region has 1 store and 1 hfile of
>>>>> 5+GB. What we realized was that, we were writing some bulk data, which
>>> used
>>>>> to land on this region. After we stopped writing this data, this
region
>>> has
>>>>> been receiving very few writes per day.
>>>>>
>>>>> When mapreduce job runs, it creates a map task for this region and
that
>>>>> task fails with OutOfOrderScannerNextException. I was able to
reproduce
>>>>> this error by running a scan command with same start/stop row and
>>> timerange
>>>>> option. Finally, we split this region to be small enough for scan
>>> command
>>>>> to work.
>>>>>
>>>>> My query is if there is any option, apart from increasing the timeout,
>>>>> which can solve this use case? I am thinking of a use case where data
>>> comes
>>>>> in for 3 days a week in bulk and then nothing for next 3 days. Kind of
>>>>> creating a data hole in region.
>>>>> My understanding is that I am hit with this error because I have big
>>> store
>>>>> files and timerange scan is reading entire file even though it
contains
>>>>> very few rowkeys for that timerange.
>>>>>
>>>>> hbase.client.scanner.caching = 100
>>>>> hbase.client.scanner.timeout.period = 60s
>>>>>
>>>>> scan 'dummytable',{ STARTROW=>'dummyrowkey-start',
>>>>> STOPROW=>'dummyrowkey-end', LIMIT=>1000,
>>>>> TIMERANGE=>[1433462400000,1433548800000]}
>>>>> ROW                                           COLUMN+CELL
>>>>>
>>>>> ERROR:
>>> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>>>>> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>>>>> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false
>>>>> next_call_seq: 0
>>>>> at
>>>
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
>>>>> at
>>>
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
>>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>>>>> at
>>>
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>>>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>>
>>>>> Regards,
>>>>> Arun
>>>
>

Re: Query on OutOfOrderScannerNextException

Posted by Arun Mishra <ar...@me.com>.

Thanks Vladimir. I am using option 2 as a short term fix for now. I will definitely look into key design. 

Regards,
Arun.

> On Jun 6, 2015, at 3:18 PM, Vladimir Rodionov <vl...@gmail.com> wrote:
> 
> The scanner fails at the very beginning. The reason is because they need a
> very few rows from a large file and HBase needs
> to fill RPC buffer (which is 100 rows, yes?) before it can return first
> batch. This takes more than 60 sec and scanner fails (do not ask me why its
> not the timeout exception)
> 
> 1. HBASE-13090 will help (can be back ported I presume to 1.0 and 0.98.x)
> 2. Smaller region size will help
> 3. Smaller  hbase.client.scanner.caching will help
> 4. Larger hbase.client.scanner.timeout.period will help
> 5. Better data store design (rowkeys) is preferred.
> 
> Too many options to choose from.
> 
> -Vlad
> 
> 
>> On Sat, Jun 6, 2015 at 3:04 PM, Arun Mishra <ar...@me.com> wrote:
>> 
>> Thanks TED.
>> 
>> Regards,
>> Arun.
>> 
>>> On Jun 6, 2015, at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
>>> 
>>> HBASE-13090 'Progress heartbeats for long running scanners' solves the
>>> problem you faced.
>>> 
>>> It is in the 1.1.0 release.
>>> 
>>> FYI
>>> 
>>>> On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <ar...@me.com> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I have a query on OutOfOrderScannerNextException. I am using hbase
>> 0.98.6
>>>> with 45 nodes.
>>>> 
>>>> I have a mapreduce job which scan 1 table for last 1 day worth data
>> using
>>>> timerange. It has been running fine for months without any failure. But
>>>> last couple of days it has been failing with below exception. I have
>> traced
>>>> the failure to a single region. This region has 1 store and 1 hfile of
>>>> 5+GB. What we realized was that, we were writing some bulk data, which
>> used
>>>> to land on this region. After we stopped writing this data, this region
>> has
>>>> been receiving very few writes per day.
>>>> 
>>>> When mapreduce job runs, it creates a map task for this region and that
>>>> task fails with OutOfOrderScannerNextException. I was able to reproduce
>>>> this error by running a scan command with same start/stop row and
>> timerange
>>>> option. Finally, we split this region to be small enough for scan
>> command
>>>> to work.
>>>> 
>>>> My query is if there is any option, apart from increasing the timeout,
>>>> which can solve this use case? I am thinking of a use case where data
>> comes
>>>> in for 3 days a week in bulk and then nothing for next 3 days. Kind of
>>>> creating a data hole in region.
>>>> My understanding is that I am hit with this error because I have big
>> store
>>>> files and timerange scan is reading entire file even though it contains
>>>> very few rowkeys for that timerange.
>>>> 
>>>> hbase.client.scanner.caching = 100
>>>> hbase.client.scanner.timeout.period = 60s
>>>> 
>>>> scan 'dummytable',{ STARTROW=>'dummyrowkey-start',
>>>> STOPROW=>'dummyrowkey-end', LIMIT=>1000,
>>>> TIMERANGE=>[1433462400000,1433548800000]}
>>>> ROW                                           COLUMN+CELL
>>>> 
>>>> ERROR:
>> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>>>> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>>>> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false
>>>> next_call_seq: 0
>>>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
>>>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>>>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 
>>>> 
>>>> Regards,
>>>> Arun
>>

Re: Query on OutOfOrderScannerNextException

Posted by Vladimir Rodionov <vl...@gmail.com>.

The scanner fails at the very beginning. The reason is because they need a
very few rows from a large file and HBase needs
to fill RPC buffer (which is 100 rows, yes?) before it can return first
batch. This takes more than 60 sec and scanner fails (do not ask me why its
not the timeout exception)

1. HBASE-13090 will help (can be back ported I presume to 1.0 and 0.98.x)
2. Smaller region size will help
3. Smaller  hbase.client.scanner.caching will help
4. Larger hbase.client.scanner.timeout.period will help
5. Better data store design (rowkeys) is preferred.

Too many options to choose from.

-Vlad


On Sat, Jun 6, 2015 at 3:04 PM, Arun Mishra <ar...@me.com> wrote:

> Thanks TED.
>
> Regards,
> Arun.
>
> > On Jun 6, 2015, at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > HBASE-13090 'Progress heartbeats for long running scanners' solves the
> > problem you faced.
> >
> > It is in the 1.1.0 release.
> >
> > FYI
> >
> >> On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <ar...@me.com> wrote:
> >>
> >> Hello,
> >>
> >> I have a query on OutOfOrderScannerNextException. I am using hbase
> 0.98.6
> >> with 45 nodes.
> >>
> >> I have a mapreduce job which scan 1 table for last 1 day worth data
> using
> >> timerange. It has been running fine for months without any failure. But
> >> last couple of days it has been failing with below exception. I have
> traced
> >> the failure to a single region. This region has 1 store and 1 hfile of
> >> 5+GB. What we realized was that, we were writing some bulk data, which
> used
> >> to land on this region. After we stopped writing this data, this region
> has
> >> been receiving very few writes per day.
> >>
> >> When mapreduce job runs, it creates a map task for this region and that
> >> task fails with OutOfOrderScannerNextException. I was able to reproduce
> >> this error by running a scan command with same start/stop row and
> timerange
> >> option. Finally, we split this region to be small enough for scan
> command
> >> to work.
> >>
> >> My query is if there is any option, apart from increasing the timeout,
> >> which can solve this use case? I am thinking of a use case where data
> comes
> >> in for 3 days a week in bulk and then nothing for next 3 days. Kind of
> >> creating a data hole in region.
> >> My understanding is that I am hit with this error because I have big
> store
> >> files and timerange scan is reading entire file even though it contains
> >> very few rowkeys for that timerange.
> >>
> >> hbase.client.scanner.caching = 100
> >> hbase.client.scanner.timeout.period = 60s
> >>
> >> scan 'dummytable',{ STARTROW=>'dummyrowkey-start',
> >> STOPROW=>'dummyrowkey-end', LIMIT=>1000,
> >> TIMERANGE=>[1433462400000,1433548800000]}
> >> ROW                                           COLUMN+CELL
> >>
> >> ERROR:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false
> >> next_call_seq: 0
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
> >> at
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >>
> >> Regards,
> >> Arun
>

Re: Query on OutOfOrderScannerNextException

Posted by Arun Mishra <ar...@me.com>.

Thanks TED.

Regards,
Arun.

> On Jun 6, 2015, at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
> 
> HBASE-13090 'Progress heartbeats for long running scanners' solves the
> problem you faced.
> 
> It is in the 1.1.0 release.
> 
> FYI
> 
>> On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <ar...@me.com> wrote:
>> 
>> Hello,
>> 
>> I have a query on OutOfOrderScannerNextException. I am using hbase 0.98.6
>> with 45 nodes.
>> 
>> I have a mapreduce job which scan 1 table for last 1 day worth data using
>> timerange. It has been running fine for months without any failure. But
>> last couple of days it has been failing with below exception. I have traced
>> the failure to a single region. This region has 1 store and 1 hfile of
>> 5+GB. What we realized was that, we were writing some bulk data, which used
>> to land on this region. After we stopped writing this data, this region has
>> been receiving very few writes per day.
>> 
>> When mapreduce job runs, it creates a map task for this region and that
>> task fails with OutOfOrderScannerNextException. I was able to reproduce
>> this error by running a scan command with same start/stop row and timerange
>> option. Finally, we split this region to be small enough for scan command
>> to work.
>> 
>> My query is if there is any option, apart from increasing the timeout,
>> which can solve this use case? I am thinking of a use case where data comes
>> in for 3 days a week in bulk and then nothing for next 3 days. Kind of
>> creating a data hole in region.
>> My understanding is that I am hit with this error because I have big store
>> files and timerange scan is reading entire file even though it contains
>> very few rowkeys for that timerange.
>> 
>> hbase.client.scanner.caching = 100
>> hbase.client.scanner.timeout.period = 60s
>> 
>> scan 'dummytable',{ STARTROW=>'dummyrowkey-start',
>> STOPROW=>'dummyrowkey-end', LIMIT=>1000,
>> TIMERANGE=>[1433462400000,1433548800000]}
>> ROW                                           COLUMN+CELL
>> 
>> ERROR: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false
>> next_call_seq: 0
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>> at java.lang.Thread.run(Thread.java:745)
>> 
>> 
>> Regards,
>> Arun

Re: Query on OutOfOrderScannerNextException

Posted by Ted Yu <yu...@gmail.com>.

HBASE-13090 'Progress heartbeats for long running scanners' solves the
problem you faced.

It is in the 1.1.0 release.

FYI

On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <ar...@me.com> wrote:

> Hello,
>
> I have a query on OutOfOrderScannerNextException. I am using hbase 0.98.6
> with 45 nodes.
>
> I have a mapreduce job which scan 1 table for last 1 day worth data using
> timerange. It has been running fine for months without any failure. But
> last couple of days it has been failing with below exception. I have traced
> the failure to a single region. This region has 1 store and 1 hfile of
> 5+GB. What we realized was that, we were writing some bulk data, which used
> to land on this region. After we stopped writing this data, this region has
> been receiving very few writes per day.
>
> When mapreduce job runs, it creates a map task for this region and that
> task fails with OutOfOrderScannerNextException. I was able to reproduce
> this error by running a scan command with same start/stop row and timerange
> option. Finally, we split this region to be small enough for scan command
> to work.
>
> My query is if there is any option, apart from increasing the timeout,
> which can solve this use case? I am thinking of a use case where data comes
> in for 3 days a week in bulk and then nothing for next 3 days. Kind of
> creating a data hole in region.
> My understanding is that I am hit with this error because I have big store
> files and timerange scan is reading entire file even though it contains
> very few rowkeys for that timerange.
>
> hbase.client.scanner.caching = 100
> hbase.client.scanner.timeout.period = 60s
>
> scan 'dummytable',{ STARTROW=>'dummyrowkey-start',
> STOPROW=>'dummyrowkey-end', LIMIT=>1000,
> TIMERANGE=>[1433462400000,1433548800000]}
> ROW                                           COLUMN+CELL
>
> ERROR: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false
> next_call_seq: 0
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Regards,
> Arun