You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jan Lukavsky (Created) (JIRA)" <ji...@apache.org> on 2012/04/10 11:59:18 UTC

[jira] [Created] (HBASE-5757) TableInputFormat should handle as much errors as possible

TableInputFormat should handle as much errors as possible
---------------------------------------------------------

                 Key: HBASE-5757
                 URL: https://issues.apache.org/jira/browse/HBASE-5757
             Project: HBase
          Issue Type: Bug
          Components: mapred, mapreduce
    Affects Versions: 0.90.6
            Reporter: Jan Lukavsky


Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
 * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
 * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
 * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
 * I don't see any possibility to get rid of LeaseException (this is configured on server side)

I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as much errors as possible

Posted by "Jan Lukavsky (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251460#comment-13251460 ] 

Jan Lukavsky commented on HBASE-5757:
-------------------------------------

The problem with multiple fetching of rows doesn't exist. I thought (don't know why) that ScannerTimeoutException can be thrown while processing rows cached in the scanner on client side. This is not the case. Adding counter for the number of retries in the input format might be interesting nevertheless.
                
> TableInputFormat should handle as much errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5757:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.2
                   0.90.7
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Commited the 0.92 version to 0.92/0.90 branches.  Thanks for review Ted, thanks for patches Jan!
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jan Lukavsky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lukavsky updated HBASE-5757:
--------------------------------

    Attachment: HBASE-5757-trunk-r1341041.patch

There was conflicting commit to patch for HBASE-6004. Merged this patch, the new one should apply to revision 1341041.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh reassigned HBASE-5757:
-------------------------------------

    Assignee: Jan Lukavsky
    
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280315#comment-13280315 ] 

Jonathan Hsieh commented on HBASE-5757:
---------------------------------------

Zhihong, thanks for pinging me about this.  Jan, thanks for being patient with me on this.

The changes look good.  Patch applies to 0.94 and trunk.  I believe the request was for getting this into 0.90 -- I'll look into backporting this behavior back into that version.


                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280240#comment-13280240 ] 

Hadoop QA commented on HBASE-5757:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528434/HBASE-5757-trunk-r1341041.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
                  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
                  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//console

This message is automatically generated.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5757:
------------------------------

    Status: Patch Available  (was: Open)
    
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5757:
----------------------------------

    Attachment: hbase-5757-92.patch

hbase-5757-92.patch is for 0.92 and 0.90 versions.  Underlaying metrics have changed so it does not update metrics like in 0.94 or trunk/0.96.  It does however include the updated tests that demonstrated updated semantics.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jan Lukavsky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lukavsky updated HBASE-5757:
--------------------------------

    Summary: TableInputFormat should handle as many errors as possible  (was: TableInputFormat should handle as much errors as possible)
    
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280249#comment-13280249 ] 

Zhihong Yu commented on HBASE-5757:
-----------------------------------

I ran the following two tests and they passed with the latest patch:
{code}
  518  mt -Dtest=TestClassLoading
  519  mt -Dtest=TestSplitTransactionOnCluster
{code}
The replication tests have been failing and are not related to this change.

Minor comments:
{code}
+        // try to handle exceptions all possible exceptions by restarting
{code}
The first 'exceptions ' should be removed.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280412#comment-13280412 ] 

Hadoop QA commented on HBASE-5757:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528472/hbase-5757-92.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1946//console

This message is automatically generated.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl closed HBASE-5757.
--------------------------------

    
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.94.1, 0.96.0
>
>         Attachments: 5757-trunk-v2.txt, hbase-5757-92.patch, HBASE-5757.patch, HBASE-5757.patch, HBASE-5757-trunk-r1341041.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280157#comment-13280157 ] 

Zhihong Yu commented on HBASE-5757:
-----------------------------------

@Jan:
Neither patch applies to trunk as of today.
Can you attach patch for trunk and name it accordingly ?

Thanks
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5757:
------------------------------

    Attachment: 5757-trunk-v2.txt

Patch v2 changes the comments w.r.t. exceptions being handled.

@Jon:
Do you have further comments ?
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jan Lukavsky (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271197#comment-13271197 ] 

Jan Lukavsky commented on HBASE-5757:
-------------------------------------

Hi Jon,

I'm not sure, but IMO the purpose of DoNotRetryIOException is to instruct the HTable client not to retry the request. In TableInputFormat we are working on higher level, so retrying is OK. DNRIOEx is to distinguish exceptions that might be caused by region reassignment for instance, and that might disappear if the request is resent (and possibly dropping the cached region location and quering .META. again). UnknonwnScannerException on the other hand will not 'disapper' if the *same* request is sent by HTable client. But in the InputFormat we can restart the scanner, and so we will not send the same request, hence it can succeed.

Retrying the request just once and then giving up is to avoid infinite cycles, and mostly it suffices to retry just once, because a typical cause of the UnknownScannerException or LeaseException is too slow Mapper (there could be other like scanning for too sparse column, but this will not be solved by this issue :)). There is possibility to lower scanner caching, but this might be inefficient (eg. when the 99.99% of time the caching is just OK, and then there exists some strange records, that take the Mapper longer to process). Lowering the caching globally just because of these few records doesn't sound like the 'correct' solution.


                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jan Lukavsky (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13275793#comment-13275793 ] 

Jan Lukavsky commented on HBASE-5757:
-------------------------------------

{quote}Note that we've been able to can set scanner caching on each individual scan in since 0.20 (HBASE-1759) – setting it for that job may be more 'correct'. {quote}

We are setting different caching for different jobs, the problem is that the rows may take different time to process (based on job) and this cannot be told in advance. Currently, it is only possible to set the caching for the whole job, but even if it was possible to change the caching *during* the job, we would not know that we need to do it before we will get the ScannerTimeoutException. So handling this error in the TableInputFormat seems right solution to me.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280622#comment-13280622 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-0.92 #415 (See [https://builds.apache.org/job/HBase-0.92/415/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341205)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as much errors as possible

Posted by "Jan Lukavsky (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lukavsky updated HBASE-5757:
--------------------------------

    Attachment: HBASE-5757.patch

Attaching *very* simple patch with no test modifications. This is functional for us (not tested the mapred API). Although, no counter for the restarts is added.
                
> TableInputFormat should handle as much errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270176#comment-13270176 ] 

Jonathan Hsieh commented on HBASE-5757:
---------------------------------------

Jan,

I looked that the logic again I think your are right.  When I did a quick glance last time I only saw the isolated patch and didn't see enough context to see the existing retry logic. (review board is helpful).

Mind adding some comments explaining why this is ok to retry?  (We are retrying once and if we fail twice we give up). It seems strange to me that we are retrying something that throws a DoNotRetyIOException.  

Anyone else have any comments?
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283267#comment-13283267 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-0.92-security #108 (See [https://builds.apache.org/job/HBase-0.92-security/108/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341205)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5757:
----------------------------------

    Fix Version/s: 0.94.1
                   0.96.0

Committed to 0.94 and 0.96.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as much errors as possible

Posted by "Jan Lukavsky (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lukavsky updated HBASE-5757:
--------------------------------

    Description: 
Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
 * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
 * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
 * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
 * I don't see any possibility to get rid of LeaseException (this is configured on server side)

I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

  was:
Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
 * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
 * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
 * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
 * I don't see any possibility to get rid of LeaseException (this is configured on server side)

I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?

    
> TableInputFormat should handle as much errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280385#comment-13280385 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-0.94 #205 (See [https://builds.apache.org/job/HBase-0.94/205/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341133)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280398#comment-13280398 ] 

Jonathan Hsieh commented on HBASE-5757:
---------------------------------------

Zhihong, Jan, if the 0.92/0.90 versions looks good to you I will commit.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280650#comment-13280650 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341132)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280318#comment-13280318 ] 

Zhihong Yu commented on HBASE-5757:
-----------------------------------

TestHLog failure was caused by:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:41331 : Address already in use
	at org.apache.hadoop.ipc.Server.bind(Server.java:227)
	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:301)
{code}
I ran it locally and it passed.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280433#comment-13280433 ] 

Zhihong Yu commented on HBASE-5757:
-----------------------------------

TestTableInputFormat passed in 0.92 with 0.92 patch.

+1 from me.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280328#comment-13280328 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-TRUNK #2911 (See [https://builds.apache.org/job/HBase-TRUNK/2911/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341132)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280285#comment-13280285 ] 

Hadoop QA commented on HBASE-5757:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528448/5757-trunk-v2.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
                  org.apache.hadoop.hbase.regionserver.wal.TestHLog
                  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//console

This message is automatically generated.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281288#comment-13281288 ] 

Hudson commented on HBASE-5757:
-------------------------------

Integrated in HBase-0.94-security #28 (See [https://builds.apache.org/job/HBase-0.94-security/28/])
    HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341133)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java

                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>            Assignee: Jan Lukavsky
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jan Lukavsky (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lukavsky updated HBASE-5757:
--------------------------------

    Attachment: HBASE-5757.patch

Attaching patch including modified tests (pass on my box) and counter in the new API.
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271681#comment-13271681 ] 

Jonathan Hsieh commented on HBASE-5757:
---------------------------------------

Got it, great clarification on the DNRIOExn.  Can you add this in the comments of the catch block in TableInputFormat?  if it passes tests than I'll commit.  If you could add a hadoop counter that be awesome (or file a jira to add one). 

I have a feeling there might be a configuration work around.  Are you using scanner caching at all on your client?  (default is no caching). Seems like there would be a sweet spot above witch  there is diminishing returns.   It sounds like in your case your rows may be variably sized making this difficult.  

Note that we've been able to can set scanner caching on each individual scan in since 0.20 (HBASE-1759) -- setting it for that job may be more 'correct'. 

Also it looks like some of this code could go for a cleanup -- HBASE-2161 is another jira that says ScannerTimeoutException may be cruft -- why is it separate from LeaseException? (possibly related to ).  I think I would prefer if we explicitly call out the exceptions (UnknownScannerException, LeaseException and ScannerTimeoutException) that we retry on and leave out the rest to be rethrown (there was a recent thread dicussing IOException abuse).  


                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
>  * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
>  * I don't see any possibility to get rid of LeaseException (this is configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira