You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vidhyashankar Venkataraman <vi...@yahoo-inc.com> on 2011/04/13 09:40:03 UTC

A possible bug in the scanner.

(This could be a known issue. Please let me know if it is).

We had a set of uncompacted store files in a region. One of the column families had a store file of 5 Gigs. The other column families were pretty small (a few megabytes at most).

 It so turned out that all these files had rows whose TTL had expired. Now when this region was scanned (which should yield a result of a null set), we got Scanner timeouts and UnknownScannerExceptions.

And when we tried scanning the region without the large column family, the scanner returned back safely with no result.

So, I major compacted it and the scan started working correctly.

So it looks like timeouts happen if the scanner does not return any output for a specified time.
Which isn't exactly the correct thing to do, because it could be the case that the scanner was indeed busy but it just so happened that there are no rows yet to return back to the client.

We can try increasing the scanner timeout, but this doesn't resolve the underlying problem. Is this a know issue?

Thank you
Vidhya

Re: A possible bug in the scanner.

Posted by Himanshu Vashishtha <hv...@gmail.com>.

Vidhya, so yes in the case of huge files with valid rows, timerange thing
will not be effective and neither in the case of a scanner hanging in its
next calls either by a gc pause or some exhaustive computation. I voted for
this answer after reading your initial mail (but it got posted after a delay
of 3 hrs, don't know why) and lot of other facts were revealed during that
time frame :)), like jira 2077.
Good learning for me though :)

Thanks,
Himanshu

On Wed, Apr 13, 2011 at 11:47 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Himanshu,
>   Thanks, this will resolve the particular case we ran into. But what if
> the files are huge and have a wide range of timestamps and only some of the
> records in the file are valid? And for the other application that we have:
> scans with filters that returns a sparse set, the solution may not help.
>
>   Further, it won't solve the underlying problem. When a scanner is busy,
> but doesn't have any rows to return "yet", neither the client nor the region
> server should mistake it for an unresponsive scanner.
>
> V
>
> On 4/13/11 8:43 AM, "Himanshu Vashishtha" <hv...@cs.ualberta.ca> wrote:
>
> Vidhya,
> Did you try setting scanner time range. It takes min and max timestamps,
> and
> when instantiating the scanner  at RS, a time based filtering is done to
> include only selected store files. Have a look at
> StoreFile.shouldseek(Scan,
> Sortedset<byte[]). I think it should improve the response time.
>
> Himanshu
>
> On Wed, Apr 13, 2011 at 8:44 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > Hi
> >   We had enabled scanner caching but I don't think it is the same issue
> > because scanner.next in this case is blocking: the scanner is busy in the
> > region server but hasn't returned anything yet since a row to be returned
> > hasn't been found yet (all rows have expired but are still there since
> they
> > havent been compacted yet).
> >
> > Vidhya
> >
> > On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
> >
> > Have you read the following thread ?
> > "ScannerTimeoutException when a scan enables caching, no exception when
> it
> > doesn't"Did you enable caching ? If not, it is different issue.
> >
> > On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> > vidhyash@yahoo-inc.com> wrote:
> >
> > > (This could be a known issue. Please let me know if it is).
> > >
> > > We had a set of uncompacted store files in a region. One of the column
> > > families had a store file of 5 Gigs. The other column families were
> > pretty
> > > small (a few megabytes at most).
> > >
> > >  It so turned out that all these files had rows whose TTL had expired.
> > Now
> > > when this region was scanned (which should yield a result of a null
> set),
> > we
> > > got Scanner timeouts and UnknownScannerExceptions.
> > >
> > > And when we tried scanning the region without the large column family,
> > the
> > > scanner returned back safely with no result.
> > >
> > > So, I major compacted it and the scan started working correctly.
> > >
> > > So it looks like timeouts happen if the scanner does not return any
> > output
> > > for a specified time.
> > > Which isn't exactly the correct thing to do, because it could be the
> case
> > > that the scanner was indeed busy but it just so happened that there are
> > no
> > > rows yet to return back to the client.
> > >
> > > We can try increasing the scanner timeout, but this doesn't resolve the
> > > underlying problem. Is this a know issue?
> > >
> > > Thank you
> > > Vidhya
> > >
> >
> >
>
>

Re: A possible bug in the scanner.

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.

Himanshu,
   Thanks, this will resolve the particular case we ran into. But what if the files are huge and have a wide range of timestamps and only some of the records in the file are valid? And for the other application that we have: scans with filters that returns a sparse set, the solution may not help.

   Further, it won't solve the underlying problem. When a scanner is busy, but doesn't have any rows to return "yet", neither the client nor the region server should mistake it for an unresponsive scanner.

V

On 4/13/11 8:43 AM, "Himanshu Vashishtha" <hv...@cs.ualberta.ca> wrote:

Vidhya,
Did you try setting scanner time range. It takes min and max timestamps, and
when instantiating the scanner  at RS, a time based filtering is done to
include only selected store files. Have a look at StoreFile.shouldseek(Scan,
Sortedset<byte[]). I think it should improve the response time.

Himanshu

On Wed, Apr 13, 2011 at 8:44 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>   We had enabled scanner caching but I don't think it is the same issue
> because scanner.next in this case is blocking: the scanner is busy in the
> region server but hasn't returned anything yet since a row to be returned
> hasn't been found yet (all rows have expired but are still there since they
> havent been compacted yet).
>
> Vidhya
>
> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> Have you read the following thread ?
> "ScannerTimeoutException when a scan enables caching, no exception when it
> doesn't"Did you enable caching ? If not, it is different issue.
>
> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > (This could be a known issue. Please let me know if it is).
> >
> > We had a set of uncompacted store files in a region. One of the column
> > families had a store file of 5 Gigs. The other column families were
> pretty
> > small (a few megabytes at most).
> >
> >  It so turned out that all these files had rows whose TTL had expired.
> Now
> > when this region was scanned (which should yield a result of a null set),
> we
> > got Scanner timeouts and UnknownScannerExceptions.
> >
> > And when we tried scanning the region without the large column family,
> the
> > scanner returned back safely with no result.
> >
> > So, I major compacted it and the scan started working correctly.
> >
> > So it looks like timeouts happen if the scanner does not return any
> output
> > for a specified time.
> > Which isn't exactly the correct thing to do, because it could be the case
> > that the scanner was indeed busy but it just so happened that there are
> no
> > rows yet to return back to the client.
> >
> > We can try increasing the scanner timeout, but this doesn't resolve the
> > underlying problem. Is this a know issue?
> >
> > Thank you
> > Vidhya
> >
>
>

Re: A possible bug in the scanner.

Posted by Himanshu Vashishtha <hv...@cs.ualberta.ca>.

Vidhya,
Did you try setting scanner time range. It takes min and max timestamps, and
when instantiating the scanner  at RS, a time based filtering is done to
include only selected store files. Have a look at StoreFile.shouldseek(Scan,
Sortedset<byte[]). I think it should improve the response time.

Himanshu

On Wed, Apr 13, 2011 at 8:44 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>   We had enabled scanner caching but I don't think it is the same issue
> because scanner.next in this case is blocking: the scanner is busy in the
> region server but hasn't returned anything yet since a row to be returned
> hasn't been found yet (all rows have expired but are still there since they
> havent been compacted yet).
>
> Vidhya
>
> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> Have you read the following thread ?
> "ScannerTimeoutException when a scan enables caching, no exception when it
> doesn't"Did you enable caching ? If not, it is different issue.
>
> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > (This could be a known issue. Please let me know if it is).
> >
> > We had a set of uncompacted store files in a region. One of the column
> > families had a store file of 5 Gigs. The other column families were
> pretty
> > small (a few megabytes at most).
> >
> >  It so turned out that all these files had rows whose TTL had expired.
> Now
> > when this region was scanned (which should yield a result of a null set),
> we
> > got Scanner timeouts and UnknownScannerExceptions.
> >
> > And when we tried scanning the region without the large column family,
> the
> > scanner returned back safely with no result.
> >
> > So, I major compacted it and the scan started working correctly.
> >
> > So it looks like timeouts happen if the scanner does not return any
> output
> > for a specified time.
> > Which isn't exactly the correct thing to do, because it could be the case
> > that the scanner was indeed busy but it just so happened that there are
> no
> > rows yet to return back to the client.
> >
> > We can try increasing the scanner timeout, but this doesn't resolve the
> > underlying problem. Is this a know issue?
> >
> > Thank you
> > Vidhya
> >
>
>

Re: A possible bug in the scanner.

Posted by Gary Helmling <gh...@gmail.com>.

On Wed, Apr 13, 2011 at 10:03 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> >> Even without the TTL expiration being applied, I think I've heard of
> this in other cases where a very
> >> restrictive filter was used on a large table scan.
> Thanks, I was about to say that in a follow-up mail! We use a filter to
> scan records, produce a list of delete records and bulk load them back to
> HBase. And the same problem will exist even in that case.
>
> And in response to JD's suggestions, this problem 'might' be related
> (mid-way I see JD's comment on scanner timeouts during GCs which is quite
> analogous to the problem that I had pointed out): I can't quite pinpoint
> exactly what the bug tries to address and if any fix has come out of it. JD,
> can you let me know the status of the JIRA?
>
> Thank you
> V
>
>
In follow up to my earlier comment on periodically renewing the lease, I
suppose in the case of a 60+ second GC pause this won't be sufficient and
we'd still timeout.  So maybe we do need a better solution.  It would
address just the data filtering issue though.  So it could be an improvement
over what we currently have.

>From HBASE-2077, the idea of multiple simultaneous RPC calls in to the same
scanner (and hence the need for ref counting instead of simple boolean or
state) does seem a bit odd though?  Would this be needed for a future
parallel scanner implementation?  Or do we have any clear cases where this
is currently used?

--gh

Re: A possible bug in the scanner.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Vidhya, the patch in that jira is stale, needs some love.

Gary, the AtomicInteger is just there to permit multiple users of a
single Lease, not very common so can be changed.

The issue with setting some sort of progress is that the Lease is
sleeping so you cannot change it's sleeping time. You could just
replace the Integer with an AtomicBoolean tho (in use or not) so that
when the Lease wakes up it will know if someone is currently using it.

J-D

On Wed, Apr 13, 2011 at 10:05 AM, Gary Helmling <gh...@gmail.com> wrote:
> Looks like the most recent patch for HBASE-2077 does try to address this
> with the usage counter.  That may be the more correct approach, but I was
> wondering if we would do something simpler with periodically renewing the
> lease down in the RegionScanner iteration?  Sort of like calling progress()
> within an MR job.
>
>
>
> On Wed, Apr 13, 2011 at 9:42 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> This could be HBASE-2077
>>
>> J-D
>>
>> On Wed, Apr 13, 2011 at 9:15 AM, Gary Helmling <gh...@gmail.com>
>> wrote:
>> > Hi Vidhya,
>> >
>> > So it sounds like the timeout thread is timing out the scanner when it
>> takes
>> > more than 60 seconds reading through the large column family store file
>> > without returning anything to the client?  Even without the TTL
>> expiration
>> > being applied, I think I've heard of this in other cases where a very
>> > restrictive filter was used on a large table scan.
>> >
>> > If this is the case, it certainly seems like we should handle it better.
>>  We
>> > could do something as simple as refreshing the scanner timestamp every X
>> > rows when iterating server side.
>> >
>> > I'll check the code and open a JIRA (if we don't have one existing).
>>  Thanks
>> > for detailing the problem.
>> >
>> > --gh
>> >
>> >
>> >
>> > On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman <
>> > vidhyash@yahoo-inc.com> wrote:
>> >
>> >> Hi
>> >>   We had enabled scanner caching but I don't think it is the same issue
>> >> because scanner.next in this case is blocking: the scanner is busy in
>> the
>> >> region server but hasn't returned anything yet since a row to be
>> returned
>> >> hasn't been found yet (all rows have expired but are still there since
>> they
>> >> havent been compacted yet).
>> >>
>> >> Vidhya
>> >>
>> >> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>> >>
>> >> Have you read the following thread ?
>> >> "ScannerTimeoutException when a scan enables caching, no exception when
>> it
>> >> doesn't"Did you enable caching ? If not, it is different issue.
>> >>
>> >> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
>> >> vidhyash@yahoo-inc.com> wrote:
>> >>
>> >> > (This could be a known issue. Please let me know if it is).
>> >> >
>> >> > We had a set of uncompacted store files in a region. One of the column
>> >> > families had a store file of 5 Gigs. The other column families were
>> >> pretty
>> >> > small (a few megabytes at most).
>> >> >
>> >> >  It so turned out that all these files had rows whose TTL had expired.
>> >> Now
>> >> > when this region was scanned (which should yield a result of a null
>> set),
>> >> we
>> >> > got Scanner timeouts and UnknownScannerExceptions.
>> >> >
>> >> > And when we tried scanning the region without the large column family,
>> >> the
>> >> > scanner returned back safely with no result.
>> >> >
>> >> > So, I major compacted it and the scan started working correctly.
>> >> >
>> >> > So it looks like timeouts happen if the scanner does not return any
>> >> output
>> >> > for a specified time.
>> >> > Which isn't exactly the correct thing to do, because it could be the
>> case
>> >> > that the scanner was indeed busy but it just so happened that there
>> are
>> >> no
>> >> > rows yet to return back to the client.
>> >> >
>> >> > We can try increasing the scanner timeout, but this doesn't resolve
>> the
>> >> > underlying problem. Is this a know issue?
>> >> >
>> >> > Thank you
>> >> > Vidhya
>> >> >
>> >>
>> >>
>> >
>>
>

Re: A possible bug in the scanner.

Posted by Gary Helmling <gh...@gmail.com>.

Looks like the most recent patch for HBASE-2077 does try to address this
with the usage counter.  That may be the more correct approach, but I was
wondering if we would do something simpler with periodically renewing the
lease down in the RegionScanner iteration?  Sort of like calling progress()
within an MR job.



On Wed, Apr 13, 2011 at 9:42 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> This could be HBASE-2077
>
> J-D
>
> On Wed, Apr 13, 2011 at 9:15 AM, Gary Helmling <gh...@gmail.com>
> wrote:
> > Hi Vidhya,
> >
> > So it sounds like the timeout thread is timing out the scanner when it
> takes
> > more than 60 seconds reading through the large column family store file
> > without returning anything to the client?  Even without the TTL
> expiration
> > being applied, I think I've heard of this in other cases where a very
> > restrictive filter was used on a large table scan.
> >
> > If this is the case, it certainly seems like we should handle it better.
>  We
> > could do something as simple as refreshing the scanner timestamp every X
> > rows when iterating server side.
> >
> > I'll check the code and open a JIRA (if we don't have one existing).
>  Thanks
> > for detailing the problem.
> >
> > --gh
> >
> >
> >
> > On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman <
> > vidhyash@yahoo-inc.com> wrote:
> >
> >> Hi
> >>   We had enabled scanner caching but I don't think it is the same issue
> >> because scanner.next in this case is blocking: the scanner is busy in
> the
> >> region server but hasn't returned anything yet since a row to be
> returned
> >> hasn't been found yet (all rows have expired but are still there since
> they
> >> havent been compacted yet).
> >>
> >> Vidhya
> >>
> >> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
> >>
> >> Have you read the following thread ?
> >> "ScannerTimeoutException when a scan enables caching, no exception when
> it
> >> doesn't"Did you enable caching ? If not, it is different issue.
> >>
> >> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> >> vidhyash@yahoo-inc.com> wrote:
> >>
> >> > (This could be a known issue. Please let me know if it is).
> >> >
> >> > We had a set of uncompacted store files in a region. One of the column
> >> > families had a store file of 5 Gigs. The other column families were
> >> pretty
> >> > small (a few megabytes at most).
> >> >
> >> >  It so turned out that all these files had rows whose TTL had expired.
> >> Now
> >> > when this region was scanned (which should yield a result of a null
> set),
> >> we
> >> > got Scanner timeouts and UnknownScannerExceptions.
> >> >
> >> > And when we tried scanning the region without the large column family,
> >> the
> >> > scanner returned back safely with no result.
> >> >
> >> > So, I major compacted it and the scan started working correctly.
> >> >
> >> > So it looks like timeouts happen if the scanner does not return any
> >> output
> >> > for a specified time.
> >> > Which isn't exactly the correct thing to do, because it could be the
> case
> >> > that the scanner was indeed busy but it just so happened that there
> are
> >> no
> >> > rows yet to return back to the client.
> >> >
> >> > We can try increasing the scanner timeout, but this doesn't resolve
> the
> >> > underlying problem. Is this a know issue?
> >> >
> >> > Thank you
> >> > Vidhya
> >> >
> >>
> >>
> >
>

Re: A possible bug in the scanner.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

This could be HBASE-2077

J-D

On Wed, Apr 13, 2011 at 9:15 AM, Gary Helmling <gh...@gmail.com> wrote:
> Hi Vidhya,
>
> So it sounds like the timeout thread is timing out the scanner when it takes
> more than 60 seconds reading through the large column family store file
> without returning anything to the client?  Even without the TTL expiration
> being applied, I think I've heard of this in other cases where a very
> restrictive filter was used on a large table scan.
>
> If this is the case, it certainly seems like we should handle it better.  We
> could do something as simple as refreshing the scanner timestamp every X
> rows when iterating server side.
>
> I'll check the code and open a JIRA (if we don't have one existing).  Thanks
> for detailing the problem.
>
> --gh
>
>
>
> On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
>> Hi
>>   We had enabled scanner caching but I don't think it is the same issue
>> because scanner.next in this case is blocking: the scanner is busy in the
>> region server but hasn't returned anything yet since a row to be returned
>> hasn't been found yet (all rows have expired but are still there since they
>> havent been compacted yet).
>>
>> Vidhya
>>
>> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>>
>> Have you read the following thread ?
>> "ScannerTimeoutException when a scan enables caching, no exception when it
>> doesn't"Did you enable caching ? If not, it is different issue.
>>
>> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
>> vidhyash@yahoo-inc.com> wrote:
>>
>> > (This could be a known issue. Please let me know if it is).
>> >
>> > We had a set of uncompacted store files in a region. One of the column
>> > families had a store file of 5 Gigs. The other column families were
>> pretty
>> > small (a few megabytes at most).
>> >
>> >  It so turned out that all these files had rows whose TTL had expired.
>> Now
>> > when this region was scanned (which should yield a result of a null set),
>> we
>> > got Scanner timeouts and UnknownScannerExceptions.
>> >
>> > And when we tried scanning the region without the large column family,
>> the
>> > scanner returned back safely with no result.
>> >
>> > So, I major compacted it and the scan started working correctly.
>> >
>> > So it looks like timeouts happen if the scanner does not return any
>> output
>> > for a specified time.
>> > Which isn't exactly the correct thing to do, because it could be the case
>> > that the scanner was indeed busy but it just so happened that there are
>> no
>> > rows yet to return back to the client.
>> >
>> > We can try increasing the scanner timeout, but this doesn't resolve the
>> > underlying problem. Is this a know issue?
>> >
>> > Thank you
>> > Vidhya
>> >
>>
>>
>

Re: A possible bug in the scanner.

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.

>> Even without the TTL expiration being applied, I think I've heard of this in other cases where a very
>> restrictive filter was used on a large table scan.
Thanks, I was about to say that in a follow-up mail! We use a filter to scan records, produce a list of delete records and bulk load them back to HBase. And the same problem will exist even in that case.

And in response to JD's suggestions, this problem 'might' be related (mid-way I see JD's comment on scanner timeouts during GCs which is quite analogous to the problem that I had pointed out): I can't quite pinpoint exactly what the bug tries to address and if any fix has come out of it. JD, can you let me know the status of the JIRA?

Thank you
V

On 4/13/11 9:15 AM, "Gary Helmling" <gh...@gmail.com> wrote:

Hi Vidhya,

So it sounds like the timeout thread is timing out the scanner when it takes
more than 60 seconds reading through the large column family store file
without returning anything to the client?  Even without the TTL expiration
being applied, I think I've heard of this in other cases where a very
restrictive filter was used on a large table scan.

If this is the case, it certainly seems like we should handle it better.  We
could do something as simple as refreshing the scanner timestamp every X
rows when iterating server side.

I'll check the code and open a JIRA (if we don't have one existing).  Thanks
for detailing the problem.

--gh

On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>   We had enabled scanner caching but I don't think it is the same issue
> because scanner.next in this case is blocking: the scanner is busy in the
> region server but hasn't returned anything yet since a row to be returned
> hasn't been found yet (all rows have expired but are still there since they
> havent been compacted yet).
>
> Vidhya
>
> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> Have you read the following thread ?
> "ScannerTimeoutException when a scan enables caching, no exception when it
> doesn't"Did you enable caching ? If not, it is different issue.
>
> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > (This could be a known issue. Please let me know if it is).
> >
> > We had a set of uncompacted store files in a region. One of the column
> > families had a store file of 5 Gigs. The other column families were
> pretty
> > small (a few megabytes at most).
> >
> >  It so turned out that all these files had rows whose TTL had expired.
> Now
> > when this region was scanned (which should yield a result of a null set),
> we
> > got Scanner timeouts and UnknownScannerExceptions.
> >
> > And when we tried scanning the region without the large column family,
> the
> > scanner returned back safely with no result.
> >
> > So, I major compacted it and the scan started working correctly.
> >
> > So it looks like timeouts happen if the scanner does not return any
> output
> > for a specified time.
> > Which isn't exactly the correct thing to do, because it could be the case
> > that the scanner was indeed busy but it just so happened that there are
> no
> > rows yet to return back to the client.
> >
> > We can try increasing the scanner timeout, but this doesn't resolve the
> > underlying problem. Is this a know issue?
> >
> > Thank you
> > Vidhya
> >
>
>

Re: A possible bug in the scanner.

Posted by Gary Helmling <gh...@gmail.com>.

Hi Vidhya,

So it sounds like the timeout thread is timing out the scanner when it takes
more than 60 seconds reading through the large column family store file
without returning anything to the client?  Even without the TTL expiration
being applied, I think I've heard of this in other cases where a very
restrictive filter was used on a large table scan.

If this is the case, it certainly seems like we should handle it better.  We
could do something as simple as refreshing the scanner timestamp every X
rows when iterating server side.

I'll check the code and open a JIRA (if we don't have one existing).  Thanks
for detailing the problem.

--gh



On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>   We had enabled scanner caching but I don't think it is the same issue
> because scanner.next in this case is blocking: the scanner is busy in the
> region server but hasn't returned anything yet since a row to be returned
> hasn't been found yet (all rows have expired but are still there since they
> havent been compacted yet).
>
> Vidhya
>
> On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> Have you read the following thread ?
> "ScannerTimeoutException when a scan enables caching, no exception when it
> doesn't"Did you enable caching ? If not, it is different issue.
>
> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > (This could be a known issue. Please let me know if it is).
> >
> > We had a set of uncompacted store files in a region. One of the column
> > families had a store file of 5 Gigs. The other column families were
> pretty
> > small (a few megabytes at most).
> >
> >  It so turned out that all these files had rows whose TTL had expired.
> Now
> > when this region was scanned (which should yield a result of a null set),
> we
> > got Scanner timeouts and UnknownScannerExceptions.
> >
> > And when we tried scanning the region without the large column family,
> the
> > scanner returned back safely with no result.
> >
> > So, I major compacted it and the scan started working correctly.
> >
> > So it looks like timeouts happen if the scanner does not return any
> output
> > for a specified time.
> > Which isn't exactly the correct thing to do, because it could be the case
> > that the scanner was indeed busy but it just so happened that there are
> no
> > rows yet to return back to the client.
> >
> > We can try increasing the scanner timeout, but this doesn't resolve the
> > underlying problem. Is this a know issue?
> >
> > Thank you
> > Vidhya
> >
>
>

Re: A possible bug in the scanner.

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.

Hi
   We had enabled scanner caching but I don't think it is the same issue because scanner.next in this case is blocking: the scanner is busy in the region server but hasn't returned anything yet since a row to be returned hasn't been found yet (all rows have expired but are still there since they havent been compacted yet).

Vidhya

On 4/13/11 1:44 AM, "Ted Yu" <yu...@gmail.com> wrote:

Have you read the following thread ?
"ScannerTimeoutException when a scan enables caching, no exception when it
doesn't"Did you enable caching ? If not, it is different issue.

On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> (This could be a known issue. Please let me know if it is).
>
> We had a set of uncompacted store files in a region. One of the column
> families had a store file of 5 Gigs. The other column families were pretty
> small (a few megabytes at most).
>
>  It so turned out that all these files had rows whose TTL had expired. Now
> when this region was scanned (which should yield a result of a null set), we
> got Scanner timeouts and UnknownScannerExceptions.
>
> And when we tried scanning the region without the large column family, the
> scanner returned back safely with no result.
>
> So, I major compacted it and the scan started working correctly.
>
> So it looks like timeouts happen if the scanner does not return any output
> for a specified time.
> Which isn't exactly the correct thing to do, because it could be the case
> that the scanner was indeed busy but it just so happened that there are no
> rows yet to return back to the client.
>
> We can try increasing the scanner timeout, but this doesn't resolve the
> underlying problem. Is this a know issue?
>
> Thank you
> Vidhya
>

Re: A possible bug in the scanner.

Posted by Ted Yu <yu...@gmail.com>.

Have you read the following thread ?
"ScannerTimeoutException when a scan enables caching, no exception when it
doesn't"Did you enable caching ? If not, it is different issue.

On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> (This could be a known issue. Please let me know if it is).
>
> We had a set of uncompacted store files in a region. One of the column
> families had a store file of 5 Gigs. The other column families were pretty
> small (a few megabytes at most).
>
>  It so turned out that all these files had rows whose TTL had expired. Now
> when this region was scanned (which should yield a result of a null set), we
> got Scanner timeouts and UnknownScannerExceptions.
>
> And when we tried scanning the region without the large column family, the
> scanner returned back safely with no result.
>
> So, I major compacted it and the scan started working correctly.
>
> So it looks like timeouts happen if the scanner does not return any output
> for a specified time.
> Which isn't exactly the correct thing to do, because it could be the case
> that the scanner was indeed busy but it just so happened that there are no
> rows yet to return back to the client.
>
> We can try increasing the scanner timeout, but this doesn't resolve the
> underlying problem. Is this a know issue?
>
> Thank you
> Vidhya
>