You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2017/02/03 00:13:51 UTC

[jira] [Resolved] (HBASE-17570) rsgroup server move can get stuck if unassigning fails

     [ https://issues.apache.org/jira/browse/HBASE-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-17570.
---------------------------
    Resolution: Duplicate

Fixed by HBASE-17350

> rsgroup server move can get stuck if unassigning fails
> ------------------------------------------------------
>
>                 Key: HBASE-17570
>                 URL: https://issues.apache.org/jira/browse/HBASE-17570
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: stack
>             Fix For: 2.0.0
>
>
> This is pretty easy to repro in a standalone setup on master branch. Master branch has the 'fake' Master regionserver. It is showing as a regionserver in the rsgroup 'default' group. If I create a new group and then try moving servers to the new group, it will usually get stuck in the below loop... and it will never break out (have to kill master).
> Looking at code, the RSGroupAdminServer#moveServers has a loop in it that will just go on for ever; there is no timeout nor maximum tries.
> Maybe we don't see this much in a 'real' cluster. Filing this issue in meantime because needs to not keep trying for ever and fail the move.
> {code}
> 2017-01-30 21:34:46,340 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] rsgroup.RSGroupAdminServer: Unassigning 1 regions from server localhost:50143 for move to xx
> 2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OPEN, ts=1485840806167, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=PENDING_CLOSE
> 2017-01-30 21:34:46,347 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50143] regionserver.RSRpcServices: Close 8ebaa5bd7a2e906429a7b91bb2bee333 without moving
> 2017-01-30 21:34:46,348 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Flushing 1/1 column families, memstore=431 B
> 2017-01-30 21:34:46,406 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=7, memsize=431, hasBloomFilter=true, into tmp file file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/.tmp/m/999d93adf36b4406bb73dc64e0158a05
> 2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HStore: Added file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/m/999d93adf36b4406bb73dc64e0158a05, entries=2, sequenceid=7, filesize=4.9 K
> 2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Finished memstore flush of ~431 B/431, currentsize=0 B/0 for region hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. in 74ms, sequenceid=7, compaction requested=false
> 2017-01-30 21:34:46,425 INFO  [StoreCloserThread-hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.-1] regionserver.HStore: Closed m
> 2017-01-30 21:34:46,437 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Closed hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
> 2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=CLOSED
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to assign to
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Can't find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333, NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY => '', ENDKEY => ''}
> 2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161, set to FAILED_OPEN
> 2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=FAILED_OPEN
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /0:0:0:0:0:0:0:1:50272
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Refusing session request for client /0:0:0:0:0:0:0:1:50272 as it has seen zxid 0x25e our last zxid is 0xae client must try another server
> 2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:50272 (no session established for client)
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] rsgroup.RSGroupAdminServer: Unassigning 2 regions from server localhost:50143 for move to xx
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=OFFLINE
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to assign to
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.AssignmentManager: Can't find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333, NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY => '', ENDKEY => ''}
> 2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161, set to FAILED_OPEN
> 2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=FAILED_OPEN
> 2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887356, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=OFFLINE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)