You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/03 08:19:43 UTC

[jira] Created: (HADOOP-2333) [hbase] client side retries happen at the wrong level

[hbase] client side retries happen at the wrong level
-----------------------------------------------------

                 Key: HADOOP-2333
                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
    Affects Versions: 0.16.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.16.0


Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.

However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.

In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:

{code}
2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
{code}

Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.

Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tests passed. Committed.

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Status: Open  (was: Patch Available)

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Attachment: patch.txt

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548416 ] 

Hadoop QA commented on HADOOP-2333:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12370969/patch.txt
against trunk revision r601038.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1260/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1260/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1260/console

This message is automatically generated.

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Status: Patch Available  (was: Open)

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Status: Patch Available  (was: Open)

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.