You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "rajeshbabu (JIRA)" <ji...@apache.org> on 2014/03/06 13:40:45 UTC

[jira] [Resolved] (HBASE-9636) HBase shell/client 'scan table' operation is getting failed inbetween the when the regions are shifted from one Region Server to another Region Server

     [ https://issues.apache.org/jira/browse/HBASE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

rajeshbabu resolved HBASE-9636.
-------------------------------

    Resolution: Not A Problem

[~shankarlingayya]
 This is as expected behavior only.
 As for the logs you have shared, the region server holding row17-row18 range is went down at 18:20:58
 {code}
 Fri Sep 20 18:20:58 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
 Fri Sep 20 18:20:59 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: HOST-10-18-40-172/10.18.40.172:61020
 {code}
 
 After that the regions within the range took more time to assign, because the HOST-10-18-40-172 holds many regions and the need to assigned one by one after shutdown.
 From the logs we can observe this. Means the META table holds the old region server address and on each exception we will clear the cache and read from META. But meta also holds HOST-10-18-40-172 so scan failed after 7 retries.
 
 {code}
2013-09-20 18:21:33,539 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: t1,row170593,1379679042365.1ad0997453c665bb9707907be08980fa.
 
2013-09-20 18:21:33,551 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:61020-0x1413b3594140079-0x1413b3594140079-0x1413b3594140079-0x1413b3594140079 Attempting to transition node 1ad0997453c665bb9707907be08980fa from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
 
2013-09-20 18:21:33,557 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://10.18.40.153:8020/hbase/t1/c18a2bbd6ef4b53f480b53207a68c44e/cf1/04b0a1c45c9f498ebbfd4f8909e693a4, isReference=false, isBulkL
 {code}
 There are some configurations which should be tuned to avoid such kind of issues.
 1) increase retry count(hbase.client.retries.number - default 7 from shell and 10 from client)
 2) increase pause time for each retry(hbase.client.pause - default 1 sec)

>  HBase shell/client 'scan table' operation is getting failed inbetween the when the regions are shifted from one Region Server to another Region Server 
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9636
>                 URL: https://issues.apache.org/jira/browse/HBASE-9636
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.94.11
>         Environment: SuSE11
>            Reporter: shankarlingayya
>            Assignee: rajeshbabu
>
> {noformat}
> Problem:
> HBase shell/client 'scan table' operation is getting failed inbetween the when the regions are shifted from one Region Server to another Region Server
> When the table regions data moved from one Region Server to another Region Server then the client/shell should be able to handle the data from the 
> new Region server automatically (because when we have huge data in terms of GB/TB at that time one of the Region Server going down in the cluster is frequent)
> Procedure:
> 1. Setup Non HA Hadoop Cluster with two nodes (Node1-XX.XX.XX.XX,  Node2-YY.YY.YY.YY)
> 2. Install Zookeeper, HMaster & HRegionServer in Node-1
> 3. Install HRegionServer in Node-2
> 4. From Node2 create HBase Table ( table name 't1' with one column family 'cf1' )
> 5. add around 367120 rows to the table
> 6. scan the table 't1' using hbase shell & at the same time switch the region server 1 & 2 (so that the table 't1' regions data are moved from Region Server 1 to 1 & vice versa)
> 7. During this time hbase shell is getting failed in between of the scan operation as below
> ...................................................................                                
>  row172266                        column=cf1:a, timestamp=1379680737307, value=100                                              
>  row172267                        column=cf1:a, timestamp=1379680737311, value=100                                              
>  row172268                        column=cf1:a, timestamp=1379680737314, value=100                                              
>  row172269                        column=cf1:a, timestamp=1379680737317, value=100                                              
>  row17227                         column=cf1:a, timestamp=1379679668631, value=100                                              
>  row17227                         column=cf1:b, timestamp=1379681090560, value=200                                             
> ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions:
> Fri Sep 20 18:20:58 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
> Fri Sep 20 18:20:59 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: HOST-YY.YY.YY.YY/YY.YY.YY.YY:61020
> Fri Sep 20 18:21:00 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:01 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: HOST-YY.YY.YY.YY/YY.YY.YY.YY:61020
> Fri Sep 20 18:21:07 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:09 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
> Fri Sep 20 18:21:17 IST 2013, org.apache.hadoop.hbase.client.ScannerCallable@1999dc4f, java.net.ConnectException: Connection refused
> hbase(main):014:0> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)