You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/03 09:52:43 UTC

[jira] Updated: (HADOOP-2333) [hbase] client side retries happen at the wrong level

     [ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2333:
----------------------------------

    Status: Patch Available  (was: Open)

> [hbase] client side retries happen at the wrong level
> -----------------------------------------------------
>
>                 Key: HADOOP-2333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2333
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>
> Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table.
> However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect.
> In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example:
> {code}
> 2007-12-03 05:57:30,433 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions)
> 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0
> 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k
> 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021
> 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents
> 2007-12-03 05:57:30,934 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142
> 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed
> 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log)
> 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021
> 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log)
> 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021
> 2007-12-03 05:57:32,856 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137}
> 2007-12-03 05:57:33,479 WARN  [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting...
> 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022
> 2007-12-03 05:57:33,486 INFO  [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435
> 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned
> 2007-12-03 05:57:33,490 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete
> 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log)
> 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021
> 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest
> 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log)
> 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest
> 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,mwu~`@#$%^&*()-_+=:;',.<>/?[]{}|,1196661378142
> 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true)
> org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510)
> 	at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482)
> 	at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365)
> 	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> {code}
> Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce.
> Moving the retry logic to findServersForTable would handle both the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.