You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Daniel Leffel <da...@gmail.com> on 2008/07/22 20:00:34 UTC

Bug in 0.1.3 scanner?

I've had a little bit of weird behavior.

I am opening a scanner in the configure method of a Map task to load a
simple little in-memory map (I'd love to this with in-memory column stores,
but that's another story ;-).

Anyway, the scanner iterates over 4 million rows or so and I've verified
next is being called multiple times per second. However, at the same record
every time (after around 6 minutes of iteration), the scanner throws an
UnknownScannerException in the log (I see it in the log), but rather than
throw the exception to the Map task, next just returns false and the loop
ends (making it exceptionally difficult to detect if the scanner finished
scanning or not). Increasing the regionserver timeout has most definitely
solved the issue, but shouldn't calling next reset the timeout anyway?

Such a frustrating problem!

Re: Bug in 0.1.3 scanner?

Posted by Daniel Leffel <da...@gmail.com>.
Hi Stack,I started to migrate to 0.2.0 anyway in hopes of solving the
problem (and getting the 0.2.0 benefits at the same time).

I can tell you what I was seeing though. With the default lease timeout, it
was always the first region transition. After I upped the lease timeout much
higher (10 minutes), it managed to get through almost all regions (with the
same amount of pause in between). However, the second to last region took
slightly longer than 10 minutes and I got the same effect.

I'll let you know how 0.2.0 is going - I'm having some EC2 related issues
with respect to binding to IPs. More on that later...

Danny



On Wed, Jul 23, 2008 at 3:50 PM, stack <st...@duboce.net> wrote:

> I'd expect a little pause as client goes and sets up scanner on the next
> region but your log seems to be showing the client waiting so long that when
> it eventually gets through, server says I know nothing of the lease.   What
> if you up the client lease timeout?  I wonder if filters are tripping things
> up?  Is it the first region transition or sometime after the first/second
> transition?   Were 'other scanners on other tasks' that were frozen trying
> to get the same region?  Do you know the problematic region?  You need to
> pull on the reporter.progress in your map task if its completion takes
> longer than the TaskTracker lease interval.
>
> St.Ack
>
>
> Daniel Leffel wrote:
>
>> Breakthrough in the mystery?
>>
>> The key that it fails on just happens to be the last key in the region....
>>
>> Is opening the scanner on the new region the problem?
>>
>>
>>
>> On Tue, Jul 22, 2008 at 12:18 PM, Daniel Leffel <daniel.leffel@gmail.com
>> >wrote:
>>
>>
>>
>>> OK. So, I didn't notice the IOE subclassing, so that explained the fact I
>>> wasn't catching it.
>>>
>>> However, somethign is still definitely going on. The scanner at some
>>> point
>>> gets "stuck". See the next's being called:
>>> 2008-07-22 13:34:18,274 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>>> cols:{hierarcy:immediate_parents=[B@26c1186f}.  Result = false
>>> 2008-07-22 13:34:18,284 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,285 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>>> cols:{hierarcy:immediate_parents=[B@2db6235b}.  Result = false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 1121874.  Result: false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:1121874.  Result = false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 1121874.  Result: false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,292 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:1121874, colKey: hierarcy:immediate_parents, data: [B@7d5b1773.
>>> Result = false
>>> 2008-07-22 13:34:18,324 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 11218740.  Result: false
>>> 2008-07-22 13:34:18,324 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,325 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:11218740.  Result = false
>>> 2008-07-22 13:34:18,325 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 11218740.  Result: false
>>> 2008-07-22 13:34:18,325 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,325 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:11218740, colKey: hierarcy:immediate_parents, data: [B@b05236.
>>> Result = false
>>> 2008-07-22 13:34:18,334 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,334 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>>> cols:{hierarcy:immediate_parents=[B@7d5b1773}.  Result = false
>>> 2008-07-22 13:34:18,744 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
>>> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
>>> 2008-07-22 13:34:18,745 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
>>> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
>>> 2008-07-22 13:34:18,745 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Resetting.
>>> 2008-07-22 13:34:18,758 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 11391639.  Result: false
>>> 2008-07-22 13:34:18,758 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,758 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:11391639.  Result = false
>>> 2008-07-22 13:34:18,759 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 11391639.  Result: false
>>> 2008-07-22 13:34:18,759 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,759 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:11391639, colKey: hierarcy:immediate_parents, data: [B@273d1402.
>>> Result = false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 1139164.  Result: false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:1139164.  Result = false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.StopRowFilter:
>>> Filter result for rowKey: 1139164.  Result: false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,779 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>>> rowKey:1139164, colKey: hierarcy:immediate_parents, data: [B@25dd9891.
>>> Result = false
>>> 2008-07-22 13:34:18,782 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter:
>>> this.filterAllRemaining
>>> is now: false
>>> 2008-07-22 13:34:18,783 DEBUG
>>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>>> cols:{hierarcy:immediate_parents=[B@273d1402}.  Result = false
>>>
>>>
>>> Then at some point, next just hangs and the next entry in the log is
>>> expiration:
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -4780003109149242775/-4780003109149242775
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -4780003109149242775 lease expired
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -8543632393752004827/-8543632393752004827
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -8543632393752004827 lease expired
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -3834469434163616397/-3834469434163616397
>>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -3834469434163616397 lease expired
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -6363251876420626143/-6363251876420626143
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -6363251876420626143 lease expired
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -4332402553799574820/-4332402553799574820
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -4332402553799574820 lease expired
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> 7541512102009958038/7541512102009958038
>>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> 7541512102009958038 lease expired
>>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> -6673102874800967750/-6673102874800967750
>>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> -6673102874800967750 lease expired
>>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
>>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>>> 9143807894343107328/9143807894343107328
>>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer:
>>> Scanner
>>> 9143807894343107328 lease expired
>>>
>>> Now interestingly, the map task doesn't know about the expired lease
>>> until
>>> a number of minutes later when it finally tries to call next:
>>> 2008-07-22 14:54:29,062 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 8 on 60020, call next(-4780003109149242775) from
>>> 10.252.242.159:60900: error:
>>> org.apache.hadoop.hbase.UnknownScannerException: Name:
>>> -4780003109149242775
>>> org.apache.hadoop.hbase.UnknownScannerException: Name:
>>> -4780003109149242775
>>>        at
>>> org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1506)
>>>        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>        at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>>> 2008-07-22 14:54:29,070 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 3 on 60020, call close(-4780003109149242775) from
>>> 10.252.242.159:60900: error:
>>> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
>>> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
>>>        at
>>> org.apache.hadoop.hbase.HRegionServer.close(HRegionServer.java:1616)
>>>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>>        at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at
>>> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>>>
>>>
>>> Even more interestingly is that the other scanners for the other tasks
>>> were
>>> also "frozen" during the time and their leases expire too.
>>>
>>> Inside the iteration, it's pretty simple stuff happening - string
>>> manipulation mostly and a put into a bdb - takes <10ms.
>>>
>>> The only other suspect piece of code is a reporter.progress() which I had
>>> to put or else the task tracker thinks the task is dead. I've not had
>>> trouble with reporter.progress() to date...
>>>
>>> Any thoughts?
>>>
>>>
>>> On Tue, Jul 22, 2008 at 11:22 AM, stack <st...@duboce.net> wrote:
>>>
>>>
>>>
>>>> Hey D:
>>>>
>>>> TableInputFormat#TableRecordReader#next looks like it lets out
>>>> IOExceptions.  UnknownScannerException subclasses IOE so I'd think it
>>>> should
>>>> bubble up into your map task.
>>>>
>>>> Paste in the exception you found in your logs.  That might help figure
>>>> whats going on.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>> Daniel Leffel wrote:
>>>>
>>>>
>>>>
>>>>> I've had a little bit of weird behavior.
>>>>>
>>>>> I am opening a scanner in the configure method of a Map task to load a
>>>>> simple little in-memory map (I'd love to this with in-memory column
>>>>> stores,
>>>>> but that's another story ;-).
>>>>>
>>>>> Anyway, the scanner iterates over 4 million rows or so and I've
>>>>> verified
>>>>> next is being called multiple times per second. However, at the same
>>>>> record
>>>>> every time (after around 6 minutes of iteration), the scanner throws an
>>>>> UnknownScannerException in the log (I see it in the log), but rather
>>>>> than
>>>>> throw the exception to the Map task, next just returns false and the
>>>>> loop
>>>>> ends (making it exceptionally difficult to detect if the scanner
>>>>> finished
>>>>> scanning or not). Increasing the regionserver timeout has most
>>>>> definitely
>>>>> solved the issue, but shouldn't calling next reset the timeout anyway?
>>>>>
>>>>> Such a frustrating problem!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: Bug in 0.1.3 scanner?

Posted by stack <st...@duboce.net>.
I'd expect a little pause as client goes and sets up scanner on the next 
region but your log seems to be showing the client waiting so long that 
when it eventually gets through, server says I know nothing of the 
lease.   What if you up the client lease timeout?  I wonder if filters 
are tripping things up?  Is it the first region transition or sometime 
after the first/second transition?   Were 'other scanners on other 
tasks' that were frozen trying to get the same region?  Do you know the 
problematic region?  You need to pull on the reporter.progress in your 
map task if its completion takes longer than the TaskTracker lease interval.
St.Ack


Daniel Leffel wrote:
> Breakthrough in the mystery?
>
> The key that it fails on just happens to be the last key in the region....
>
> Is opening the scanner on the new region the problem?
>
>
>
> On Tue, Jul 22, 2008 at 12:18 PM, Daniel Leffel <da...@gmail.com>wrote:
>
>   
>> OK. So, I didn't notice the IOE subclassing, so that explained the fact I
>> wasn't catching it.
>>
>> However, somethign is still definitely going on. The scanner at some point
>> gets "stuck". See the next's being called:
>> 2008-07-22 13:34:18,274 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>> cols:{hierarcy:immediate_parents=[B@26c1186f}.  Result = false
>> 2008-07-22 13:34:18,284 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,285 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>> cols:{hierarcy:immediate_parents=[B@2db6235b}.  Result = false
>> 2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 1121874.  Result: false
>> 2008-07-22 13:34:18,292 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,292 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:1121874.  Result = false
>> 2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 1121874.  Result: false
>> 2008-07-22 13:34:18,292 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,292 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:1121874, colKey: hierarcy:immediate_parents, data: [B@7d5b1773.
>> Result = false
>> 2008-07-22 13:34:18,324 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 11218740.  Result: false
>> 2008-07-22 13:34:18,324 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,325 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:11218740.  Result = false
>> 2008-07-22 13:34:18,325 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 11218740.  Result: false
>> 2008-07-22 13:34:18,325 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,325 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:11218740, colKey: hierarcy:immediate_parents, data: [B@b05236.
>> Result = false
>> 2008-07-22 13:34:18,334 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,334 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>> cols:{hierarcy:immediate_parents=[B@7d5b1773}.  Result = false
>> 2008-07-22 13:34:18,744 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
>> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
>> 2008-07-22 13:34:18,745 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
>> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
>> 2008-07-22 13:34:18,745 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Resetting.
>> 2008-07-22 13:34:18,758 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 11391639.  Result: false
>> 2008-07-22 13:34:18,758 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,758 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:11391639.  Result = false
>> 2008-07-22 13:34:18,759 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 11391639.  Result: false
>> 2008-07-22 13:34:18,759 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,759 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:11391639, colKey: hierarcy:immediate_parents, data: [B@273d1402.
>> Result = false
>> 2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 1139164.  Result: false
>> 2008-07-22 13:34:18,779 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,779 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:1139164.  Result = false
>> 2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
>> Filter result for rowKey: 1139164.  Result: false
>> 2008-07-22 13:34:18,779 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,779 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
>> rowKey:1139164, colKey: hierarcy:immediate_parents, data: [B@25dd9891.
>> Result = false
>> 2008-07-22 13:34:18,782 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
>> is now: false
>> 2008-07-22 13:34:18,783 DEBUG
>> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
>> cols:{hierarcy:immediate_parents=[B@273d1402}.  Result = false
>>
>>
>> Then at some point, next just hangs and the next entry in the log is
>> expiration:
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -4780003109149242775/-4780003109149242775
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -4780003109149242775 lease expired
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -8543632393752004827/-8543632393752004827
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -8543632393752004827 lease expired
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -3834469434163616397/-3834469434163616397
>> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -3834469434163616397 lease expired
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -6363251876420626143/-6363251876420626143
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -6363251876420626143 lease expired
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -4332402553799574820/-4332402553799574820
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -4332402553799574820 lease expired
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> 7541512102009958038/7541512102009958038
>> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> 7541512102009958038 lease expired
>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> -6673102874800967750/-6673102874800967750
>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> -6673102874800967750 lease expired
>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
>> regionserver/0.0.0.0:60020.leaseChecker lease expired
>> 9143807894343107328/9143807894343107328
>> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
>> 9143807894343107328 lease expired
>>
>> Now interestingly, the map task doesn't know about the expired lease until
>> a number of minutes later when it finally tries to call next:
>> 2008-07-22 14:54:29,062 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 8 on 60020, call next(-4780003109149242775) from
>> 10.252.242.159:60900: error:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
>>         at
>> org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1506)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>> 2008-07-22 14:54:29,070 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 3 on 60020, call close(-4780003109149242775) from
>> 10.252.242.159:60900: error:
>> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
>> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
>>         at
>> org.apache.hadoop.hbase.HRegionServer.close(HRegionServer.java:1616)
>>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>>
>>
>> Even more interestingly is that the other scanners for the other tasks were
>> also "frozen" during the time and their leases expire too.
>>
>> Inside the iteration, it's pretty simple stuff happening - string
>> manipulation mostly and a put into a bdb - takes <10ms.
>>
>> The only other suspect piece of code is a reporter.progress() which I had
>> to put or else the task tracker thinks the task is dead. I've not had
>> trouble with reporter.progress() to date...
>>
>> Any thoughts?
>>
>>
>> On Tue, Jul 22, 2008 at 11:22 AM, stack <st...@duboce.net> wrote:
>>
>>     
>>> Hey D:
>>>
>>> TableInputFormat#TableRecordReader#next looks like it lets out
>>> IOExceptions.  UnknownScannerException subclasses IOE so I'd think it should
>>> bubble up into your map task.
>>>
>>> Paste in the exception you found in your logs.  That might help figure
>>> whats going on.
>>>
>>> St.Ack
>>>
>>>
>>> Daniel Leffel wrote:
>>>
>>>       
>>>> I've had a little bit of weird behavior.
>>>>
>>>> I am opening a scanner in the configure method of a Map task to load a
>>>> simple little in-memory map (I'd love to this with in-memory column
>>>> stores,
>>>> but that's another story ;-).
>>>>
>>>> Anyway, the scanner iterates over 4 million rows or so and I've verified
>>>> next is being called multiple times per second. However, at the same
>>>> record
>>>> every time (after around 6 minutes of iteration), the scanner throws an
>>>> UnknownScannerException in the log (I see it in the log), but rather than
>>>> throw the exception to the Map task, next just returns false and the loop
>>>> ends (making it exceptionally difficult to detect if the scanner finished
>>>> scanning or not). Increasing the regionserver timeout has most definitely
>>>> solved the issue, but shouldn't calling next reset the timeout anyway?
>>>>
>>>> Such a frustrating problem!
>>>>
>>>>
>>>>
>>>>         
>>>       
>
>   


Re: Bug in 0.1.3 scanner?

Posted by Daniel Leffel <da...@gmail.com>.
Breakthrough in the mystery?

The key that it fails on just happens to be the last key in the region....

Is opening the scanner on the new region the problem?



On Tue, Jul 22, 2008 at 12:18 PM, Daniel Leffel <da...@gmail.com>wrote:

> OK. So, I didn't notice the IOE subclassing, so that explained the fact I
> wasn't catching it.
>
> However, somethign is still definitely going on. The scanner at some point
> gets "stuck". See the next's being called:
> 2008-07-22 13:34:18,274 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
> cols:{hierarcy:immediate_parents=[B@26c1186f}.  Result = false
> 2008-07-22 13:34:18,284 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,285 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
> cols:{hierarcy:immediate_parents=[B@2db6235b}.  Result = false
> 2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 1121874.  Result: false
> 2008-07-22 13:34:18,292 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,292 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:1121874.  Result = false
> 2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 1121874.  Result: false
> 2008-07-22 13:34:18,292 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,292 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:1121874, colKey: hierarcy:immediate_parents, data: [B@7d5b1773.
> Result = false
> 2008-07-22 13:34:18,324 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 11218740.  Result: false
> 2008-07-22 13:34:18,324 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,325 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:11218740.  Result = false
> 2008-07-22 13:34:18,325 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 11218740.  Result: false
> 2008-07-22 13:34:18,325 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,325 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:11218740, colKey: hierarcy:immediate_parents, data: [B@b05236.
> Result = false
> 2008-07-22 13:34:18,334 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,334 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
> cols:{hierarcy:immediate_parents=[B@7d5b1773}.  Result = false
> 2008-07-22 13:34:18,744 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
> 2008-07-22 13:34:18,745 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
> sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
> 2008-07-22 13:34:18,745 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Resetting.
> 2008-07-22 13:34:18,758 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 11391639.  Result: false
> 2008-07-22 13:34:18,758 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,758 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:11391639.  Result = false
> 2008-07-22 13:34:18,759 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 11391639.  Result: false
> 2008-07-22 13:34:18,759 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,759 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:11391639, colKey: hierarcy:immediate_parents, data: [B@273d1402.
> Result = false
> 2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 1139164.  Result: false
> 2008-07-22 13:34:18,779 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,779 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:1139164.  Result = false
> 2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
> Filter result for rowKey: 1139164.  Result: false
> 2008-07-22 13:34:18,779 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,779 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
> rowKey:1139164, colKey: hierarcy:immediate_parents, data: [B@25dd9891.
> Result = false
> 2008-07-22 13:34:18,782 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
> is now: false
> 2008-07-22 13:34:18,783 DEBUG
> org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
> cols:{hierarcy:immediate_parents=[B@273d1402}.  Result = false
>
>
> Then at some point, next just hangs and the next entry in the log is
> expiration:
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -4780003109149242775/-4780003109149242775
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -4780003109149242775 lease expired
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -8543632393752004827/-8543632393752004827
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -8543632393752004827 lease expired
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -3834469434163616397/-3834469434163616397
> 2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -3834469434163616397 lease expired
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -6363251876420626143/-6363251876420626143
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -6363251876420626143 lease expired
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -4332402553799574820/-4332402553799574820
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -4332402553799574820 lease expired
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> 7541512102009958038/7541512102009958038
> 2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> 7541512102009958038 lease expired
> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> -6673102874800967750/-6673102874800967750
> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> -6673102874800967750 lease expired
> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
> regionserver/0.0.0.0:60020.leaseChecker lease expired
> 9143807894343107328/9143807894343107328
> 2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
> 9143807894343107328 lease expired
>
> Now interestingly, the map task doesn't know about the expired lease until
> a number of minutes later when it finally tries to call next:
> 2008-07-22 14:54:29,062 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 60020, call next(-4780003109149242775) from
> 10.252.242.159:60900: error:
> org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
> org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
>         at
> org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1506)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
> 2008-07-22 14:54:29,070 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 60020, call close(-4780003109149242775) from
> 10.252.242.159:60900: error:
> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
> org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
>         at
> org.apache.hadoop.hbase.HRegionServer.close(HRegionServer.java:1616)
>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>
>
> Even more interestingly is that the other scanners for the other tasks were
> also "frozen" during the time and their leases expire too.
>
> Inside the iteration, it's pretty simple stuff happening - string
> manipulation mostly and a put into a bdb - takes <10ms.
>
> The only other suspect piece of code is a reporter.progress() which I had
> to put or else the task tracker thinks the task is dead. I've not had
> trouble with reporter.progress() to date...
>
> Any thoughts?
>
>
> On Tue, Jul 22, 2008 at 11:22 AM, stack <st...@duboce.net> wrote:
>
>> Hey D:
>>
>> TableInputFormat#TableRecordReader#next looks like it lets out
>> IOExceptions.  UnknownScannerException subclasses IOE so I'd think it should
>> bubble up into your map task.
>>
>> Paste in the exception you found in your logs.  That might help figure
>> whats going on.
>>
>> St.Ack
>>
>>
>> Daniel Leffel wrote:
>>
>>> I've had a little bit of weird behavior.
>>>
>>> I am opening a scanner in the configure method of a Map task to load a
>>> simple little in-memory map (I'd love to this with in-memory column
>>> stores,
>>> but that's another story ;-).
>>>
>>> Anyway, the scanner iterates over 4 million rows or so and I've verified
>>> next is being called multiple times per second. However, at the same
>>> record
>>> every time (after around 6 minutes of iteration), the scanner throws an
>>> UnknownScannerException in the log (I see it in the log), but rather than
>>> throw the exception to the Map task, next just returns false and the loop
>>> ends (making it exceptionally difficult to detect if the scanner finished
>>> scanning or not). Increasing the regionserver timeout has most definitely
>>> solved the issue, but shouldn't calling next reset the timeout anyway?
>>>
>>> Such a frustrating problem!
>>>
>>>
>>>
>>
>>
>

Re: Bug in 0.1.3 scanner?

Posted by Daniel Leffel <da...@gmail.com>.
OK. So, I didn't notice the IOE subclassing, so that explained the fact I
wasn't catching it.

However, somethign is still definitely going on. The scanner at some point
gets "stuck". See the next's being called:
2008-07-22 13:34:18,274 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
cols:{hierarcy:immediate_parents=[B@26c1186f}.  Result = false
2008-07-22 13:34:18,284 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,285 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
cols:{hierarcy:immediate_parents=[B@2db6235b}.  Result = false
2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 1121874.  Result: false
2008-07-22 13:34:18,292 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,292 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:1121874.  Result = false
2008-07-22 13:34:18,292 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 1121874.  Result: false
2008-07-22 13:34:18,292 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,292 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:1121874, colKey: hierarcy:immediate_parents, data: [B@7d5b1773.
Result = false
2008-07-22 13:34:18,324 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 11218740.  Result: false
2008-07-22 13:34:18,324 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,325 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:11218740.  Result = false
2008-07-22 13:34:18,325 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 11218740.  Result: false
2008-07-22 13:34:18,325 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,325 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:11218740, colKey: hierarcy:immediate_parents, data: [B@b05236.
Result = false
2008-07-22 13:34:18,334 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,334 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
cols:{hierarcy:immediate_parents=[B@7d5b1773}.  Result = false
2008-07-22 13:34:18,744 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
2008-07-22 13:34:18,745 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Successfully read a
sub-filter of type: org.apache.hadoop.hbase.filter.StopRowFilter
2008-07-22 13:34:18,745 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Resetting.
2008-07-22 13:34:18,758 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 11391639.  Result: false
2008-07-22 13:34:18,758 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,758 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:11391639.  Result = false
2008-07-22 13:34:18,759 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 11391639.  Result: false
2008-07-22 13:34:18,759 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,759 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:11391639, colKey: hierarcy:immediate_parents, data: [B@273d1402.
Result = false
2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 1139164.  Result: false
2008-07-22 13:34:18,779 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,779 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:1139164.  Result = false
2008-07-22 13:34:18,779 DEBUG org.apache.hadoop.hbase.filter.StopRowFilter:
Filter result for rowKey: 1139164.  Result: false
2008-07-22 13:34:18,779 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,779 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: Filter on
rowKey:1139164, colKey: hierarcy:immediate_parents, data: [B@25dd9891.
Result = false
2008-07-22 13:34:18,782 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: this.filterAllRemaining
is now: false
2008-07-22 13:34:18,783 DEBUG
org.apache.hadoop.hbase.filter.WhileMatchRowFilter: FilterNotNull on
cols:{hierarcy:immediate_parents=[B@273d1402}.  Result = false


Then at some point, next just hangs and the next entry in the log is
expiration:
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-4780003109149242775/-4780003109149242775
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-4780003109149242775 lease expired
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-8543632393752004827/-8543632393752004827
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-8543632393752004827 lease expired
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-3834469434163616397/-3834469434163616397
2008-07-22 13:44:18,015 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-3834469434163616397 lease expired
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-6363251876420626143/-6363251876420626143
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-6363251876420626143 lease expired
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-4332402553799574820/-4332402553799574820
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-4332402553799574820 lease expired
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
7541512102009958038/7541512102009958038
2008-07-22 13:44:28,024 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
7541512102009958038 lease expired
2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
-6673102874800967750/-6673102874800967750
2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
-6673102874800967750 lease expired
2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020.leaseChecker lease expired
9143807894343107328/9143807894343107328
2008-07-22 13:44:28,025 INFO org.apache.hadoop.hbase.HRegionServer: Scanner
9143807894343107328 lease expired

Now interestingly, the map task doesn't know about the expired lease until a
number of minutes later when it finally tries to call next:
2008-07-22 14:54:29,062 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 60020, call next(-4780003109149242775) from
10.252.242.159:60900: error:
org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
org.apache.hadoop.hbase.UnknownScannerException: Name: -4780003109149242775
        at
org.apache.hadoop.hbase.HRegionServer.next(HRegionServer.java:1506)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
2008-07-22 14:54:29,070 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 60020, call close(-4780003109149242775) from
10.252.242.159:60900: error:
org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
org.apache.hadoop.hbase.UnknownScannerException: -4780003109149242775
        at
org.apache.hadoop.hbase.HRegionServer.close(HRegionServer.java:1616)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)


Even more interestingly is that the other scanners for the other tasks were
also "frozen" during the time and their leases expire too.

Inside the iteration, it's pretty simple stuff happening - string
manipulation mostly and a put into a bdb - takes <10ms.

The only other suspect piece of code is a reporter.progress() which I had to
put or else the task tracker thinks the task is dead. I've not had trouble
with reporter.progress() to date...

Any thoughts?

On Tue, Jul 22, 2008 at 11:22 AM, stack <st...@duboce.net> wrote:

> Hey D:
>
> TableInputFormat#TableRecordReader#next looks like it lets out
> IOExceptions.  UnknownScannerException subclasses IOE so I'd think it should
> bubble up into your map task.
>
> Paste in the exception you found in your logs.  That might help figure
> whats going on.
>
> St.Ack
>
>
> Daniel Leffel wrote:
>
>> I've had a little bit of weird behavior.
>>
>> I am opening a scanner in the configure method of a Map task to load a
>> simple little in-memory map (I'd love to this with in-memory column
>> stores,
>> but that's another story ;-).
>>
>> Anyway, the scanner iterates over 4 million rows or so and I've verified
>> next is being called multiple times per second. However, at the same
>> record
>> every time (after around 6 minutes of iteration), the scanner throws an
>> UnknownScannerException in the log (I see it in the log), but rather than
>> throw the exception to the Map task, next just returns false and the loop
>> ends (making it exceptionally difficult to detect if the scanner finished
>> scanning or not). Increasing the regionserver timeout has most definitely
>> solved the issue, but shouldn't calling next reset the timeout anyway?
>>
>> Such a frustrating problem!
>>
>>
>>
>
>

Re: Bug in 0.1.3 scanner?

Posted by stack <st...@duboce.net>.
Hey D:

TableInputFormat#TableRecordReader#next looks like it lets out 
IOExceptions.  UnknownScannerException subclasses IOE so I'd think it 
should bubble up into your map task.

Paste in the exception you found in your logs.  That might help figure 
whats going on.

St.Ack

Daniel Leffel wrote:
> I've had a little bit of weird behavior.
>
> I am opening a scanner in the configure method of a Map task to load a
> simple little in-memory map (I'd love to this with in-memory column stores,
> but that's another story ;-).
>
> Anyway, the scanner iterates over 4 million rows or so and I've verified
> next is being called multiple times per second. However, at the same record
> every time (after around 6 minutes of iteration), the scanner throws an
> UnknownScannerException in the log (I see it in the log), but rather than
> throw the exception to the Map task, next just returns false and the loop
> ends (making it exceptionally difficult to detect if the scanner finished
> scanning or not). Increasing the regionserver timeout has most definitely
> solved the issue, but shouldn't calling next reset the timeout anyway?
>
> Such a frustrating problem!
>
>