You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/05/04 19:54:00 UTC

[jira] [Resolved] (HBASE-26997) Auto renew scanner lease in TableRecordReader

     [ https://issues.apache.org/jira/browse/HBASE-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault resolved HBASE-26997.
---------------------------------------
    Resolution: Not A Problem

I'm going to close this actually. I realized that the retries of UnknownScannerException are actually scoped to the RPC, not the overall scanner itself. So it should theoretically work to exceed the scanner lease timeout for every single next() call and the job should still continue. This may not be the most efficient thing, but you're already not doing the most efficient thing by this point and calling renewLease probably won't improve that.

> Auto renew scanner lease in TableRecordReader
> ---------------------------------------------
>
>                 Key: HBASE-26997
>                 URL: https://issues.apache.org/jira/browse/HBASE-26997
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>
> A common problem with hadoop jobs is when the mapper takes too long to process individual inputs. This is especially problematic with TableInputFormat because if you don't process a scanner.next() batch within the scanner timeout period your job will fail with UnknownScannerException.
> The fix here is usually to reduce Scan.setCaching, so that fewer rows are returned within each batch. This isn't always a great solution because maybe not all batches are uniform in their processing time, or maybe even processing a single row (the smallest caching size) might take a while.
> We can improve this for users by providing a configurable period at which the TableRecordReader will automatically call scanner.renewLease() unless next() was recently called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)