You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jerry He <je...@gmail.com> on 2014/02/09 23:12:17 UTC

taking a hour to get a spitted region online? 0.96.0

Hi, folks

This is what I am seeing when running in a stress env.   I am getting
"RetriesExhaustedWithDetailsException: Failed 748 actions:
NotServingRegionException"
On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE to *
*SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from **SPLITTING_NEW to
OPEN.*
Am I seeing anything wrong here?

2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group FileSystemCounters
is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
*2014-02-08 20:50:44,824* WARN org.apache.hadoop.hbase.client.AsyncProcess:
Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com,60020,1391887547473
NOT resubmitting.
region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
2014-02-08 20:50:44,839 INFO org.apache.sqoop.mapreduce.AutoProgressMapper:
Auto-progress thread is finished. keepGoing=false
2014-02-08 20:50:44,842 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-02-08 20:50:44,858 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hive (auth:SIMPLE)
cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
Failed 748 actions: NotServingRegionException*: 748 times,
2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error running
child
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
748 actions: NotServingRegionException: 748 times,
at
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
at
org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
at
org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
at
org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)


*2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382, server=
hdtest208.svl.ibm.com,60020,1391887547473}


*2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44
state=SPLITTING_NEW,*ts=1391924474093, server=
hdtest208.svl.ibm.com,60020,1391887547473} to
{*010c1981882d1a59201af5e2dc589d44
state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
,60020,1391887547473
2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, server=
hdtest208.svl.ibm.com,60020,1391887547473}
2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
,60020,1391887547473
2014-02-08 21:41:14,100 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
daughter
a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
daughter
b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
on hdtest208.svl.ibm.com,60020,1391887547473

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Jerry He <je...@gmail.com>.
Opened a JIRA
https://issues.apache.org/jira/browse/HBASE-10492


On Sun, Feb 9, 2014 at 6:11 PM, Jerry He <je...@gmail.com> wrote:

> Hi, Ted
>
> The related time period on the region sever is:
>
> 2014-02-08 20:44:12,662 WARN
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file
> not found for region: 010c1981882d1a59201af5e2dc589d44
> 2014-02-08 20:44:12,666 WARN
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file
> not found for region: c2eb9b7971ca7f3fed3da86df5b788e7
>
> There were no INFO related to these two regions until: (at the end see
> this: *Split took 57mins, 16sec)*
>
> 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355
> 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354
> 2014-02-08 21:41:14,032 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Updated row
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> with server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,054 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open
> deploy task for
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,054 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Completed compaction of 10 file(s) in cf of
> tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.
> into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is
> 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to
> execute.
> 2014-02-08 21:41:14,059 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed
> compaction: Request =
> regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.,
> storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883,
> time=1391924373278861000; duration=1mins, 40sec
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on cf in region
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Starting compaction of 10 file(s) in cf of
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
> into
> tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp,
> totalSize=709.7 M
> 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Updated row
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> with server=hdtest208.svl.ibm.com,60020,1391887547473
> 2014-02-08 21:41:14,066 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open
> deploy task for
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
> 2014-02-08 21:41:14,190 INFO
> org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta
> updated, and report to master.
> Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> new regions:
> tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7..
> *Split took 57mins, 16sec*
>
>
>
>
> On Sun, Feb 9, 2014 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Can you take a look at the server log for hdtest208.svl.ibm.com during
>> this
>> period of time ?
>>
>> Thanks
>>
>>
>> On Sun, Feb 9, 2014 at 3:23 PM, Jerry He <je...@gmail.com> wrote:
>>
>> > Hi, Ted, Jimmy
>> >
>> > There were not much activities showing up in the master log during the
>> time
>> > between.   (The BackupLogCleaner is our log cleaner plugin that only
>> checks
>> > to see if the logs can be deleted)
>> > There is no Info at all from
>> > *2014-02-08 20:59:48,659*
>> > *2014-02-08 21:21:47,445*
>> >
>> > 2014-02-08 20:35:55,910 INFO
>> org.apache.hadoop.hbase.master.CatalogJanitor:
>> > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2
>> > unreferenced parent region(s)
>> > *2014-02-08 20:43:57,382 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
>> > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
>> > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
>> > 2014-02-08 20:43:57,382 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
>> > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
>> > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
>> > ,60020,1391887547473}
>> > 2014-02-08 20:43:57,382 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
>> > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
>> > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
>> > server=
>> > hdtest208.svl.ibm.com,60020,1391887547473}
>> > 2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> > client connection, connectString=hdtest207.svl.ibm.com:2181,
>> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
>> > watcher=Backup Client
>> > 2014-02-08 20:46:47,893 INFO
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > identifier=Backup Client connecting to ZooKeeper ensemble=
>> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
>> > hdtest205.svl.ibm.com:2181
>> > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening
>> > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181.
>> Will
>> > not attempt to authenticate using SASL (Unable to locate a login
>> > configuration)
>> > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
>> > initiating session
>> > 2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session
>> > establishment complete on server
>> hdtest205.svl.ibm.com/9.30.194.214:2181,
>> > sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000
>> > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x24412bcbe5d0684 closed
>> > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn:
>> EventThread
>> > shut down
>> > 2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
>> > Deleted
>> > tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c.
>> > 2014-02-08 20:50:57,955 INFO
>> org.apache.hadoop.hbase.master.CatalogJanitor:
>> > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
>> > unreferenced parent region(s)
>> > 2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> > client connection, connectString=hdtest207.svl.ibm.com:2181,
>> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
>> > watcher=Backup Client
>> > 2014-02-08 20:59:47,626 INFO
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > identifier=Backup Client connecting to ZooKeeper ensemble=
>> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
>> > hdtest205.svl.ibm.com:2181
>> > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening
>> > socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181.
>> Will
>> > not attempt to authenticate using SASL (Unable to locate a login
>> > configuration)
>> > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > connection established to hdtest201.svl.ibm.com/9.30.194.210:2181,
>> > initiating session
>> > 2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session
>> > establishment complete on server
>> hdtest201.svl.ibm.com/9.30.194.210:2181,
>> > sessionid = 0x34412bcbe93070c, negotiated timeout = 60000
>> > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x34412bcbe93070c closed
>> > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn:
>> EventThread
>> > shut down
>> > 2014-02-08 20:59:47,641 INFO
>> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
>> > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
>> > %2C60020%2C1391887547473.1391919916110
>> > 2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> > client connection, connectString=hdtest207.svl.ibm.com:2181,
>> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
>> > watcher=Backup Client
>> > 2014-02-08 20:59:48,644 INFO
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > identifier=Backup Client connecting to ZooKeeper ensemble=
>> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
>> > hdtest205.svl.ibm.com:2181
>> > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening
>> > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181.
>> Will
>> > not attempt to authenticate using SASL (Unable to locate a login
>> > configuration)
>> > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
>> > initiating session
>> > 2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session
>> > establishment complete on server
>> hdtest205.svl.ibm.com/9.30.194.214:2181,
>> > sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000
>> > 2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x24412bcbe5d06c9 closed
>> > 2014-02-08 20:59:48,659 INFO
>> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
>> > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
>> > %2C60020%2C1391887547473.1391920146024
>> > *2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn:
>> EventThread
>> > shut down
>> > *2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper:
>> Initiating
>> > client connection, connectString=hdtest207.svl.ibm.com:2181,
>> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
>> > watcher=Backup Client
>> > 2014-02-08 21:21:47,446 INFO
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > identifier=Backup Client connecting to ZooKeeper ensemble=
>> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
>> > hdtest205.svl.ibm.com:2181
>> > 2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening
>> > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181.
>> Will
>> > not attempt to authenticate using SASL (Unable to locate a login
>> > configuration)
>> > 2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
>> > initiating session
>> > 2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session
>> > establishment complete on server
>> hdtest207.svl.ibm.com/9.30.194.216:2181,
>> > sessionid = 0x14412bcbe630833, negotiated timeout = 60000
>> > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x14412bcbe630833 closed
>> > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn:
>> EventThread
>> > shut down
>> > 2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> > client connection, connectString=hdtest207.svl.ibm.com:2181,
>> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
>> > watcher=Backup Client
>> > 2014-02-08 21:39:47,561 INFO
>> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > identifier=Backup Client connecting to ZooKeeper ensemble=
>> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
>> > hdtest205.svl.ibm.com:2181
>> > 2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening
>> > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181.
>> Will
>> > not attempt to authenticate using SASL (Unable to locate a login
>> > configuration)
>> > 2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
>> > initiating session
>> > 2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session
>> > establishment complete on server
>> hdtest207.svl.ibm.com/9.30.194.216:2181,
>> > sessionid = 0x14412bcbe630880, negotiated timeout = 60000
>> > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x14412bcbe630880 closed
>> > 2014-02-08 21:39:47,577 INFO
>> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
>> > gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com
>> > %2C60020%2C1391887547484.1391917561641
>> > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn:
>> EventThread
>> > shut down
>> > *2014-02-08 21:41*:14,093 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
>> > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
>> > {b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server=
>> > hdtest208.svl.ibm.com,60020,1391887547473}
>> > 2014-02-08 21:41:14,093 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com
>> > ,60020,1391887547473
>> > 2*014-02-08 21:41:14,093 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW,
>> > ts=1391924474093, server=hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
>> > {010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093,
>> > server=hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
>> > *2014-02-08 21:41:14,093 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473*
>> > *2014-02-08 21:41:14,093 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
>> > ts=1391924474093, server=hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
>> > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
>> > server=hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
>> > *2014-02-08 21:41:14,094 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
>> > <http://hdtest208.svl.ibm.com>,60020,1391887547473*
>> > *2014-02-08 21:41:14,100 INFO
>> > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
>> >
>> >
>> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
>> > daughter
>> >
>> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
>> > daughter
>> >
>> >
>> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
>> > on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com
>> > >,60020,1391887547473*
>> > 2014-02-08 21:41:14,283 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE,
>> > ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6
>> > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
>> > ,60020,1391887547473}
>> > 2014-02-08 21:41:14,283 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE,
>> > ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3
>> > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
>> > ,60020,1391887547473}
>> > 2014-02-08 21:41:14,283 INFO
>> org.apache.hadoop.hbase.master.RegionStates:
>> > Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN,
>> > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
>> > {1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283,
>> > server=
>> > hdtest208.svl.ibm.com,60020,1391887547473}
>> >
>> >
>> >
>> > On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <jx...@cloudera.com>
>> wrote:
>> >
>> > > Was the master very busy? How many regions were in transition during
>> that
>> > > period?
>> > >
>> > >
>> > > On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get
>> > more
>> > > > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
>> > > > 010c1981882d1a59201af5e2dc589d44. ?
>> > > >
>> > > > Cheers
>> > > >
>> > > >
>> > > > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi, folks
>> > > > >
>> > > > > This is what I am seeing when running in a stress env.   I am
>> getting
>> > > > > "RetriesExhaustedWithDetailsException: Failed 748 actions:
>> > > > > NotServingRegionException"
>> > > > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE
>> > to *
>> > > > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from
>> > > **SPLITTING_NEW
>> > > > to
>> > > > > OPEN.*
>> > > > > Am I seeing anything wrong here?
>> > > > >
>> > > > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group
>> > > FileSystemCounters
>> > > > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter
>> > > instead
>> > > > > *2014-02-08 20:50:44,824* WARN
>> > > > org.apache.hadoop.hbase.client.AsyncProcess:
>> > > > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473
>> > > > > NOT resubmitting.
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
>> > > > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
>> > > > > 2014-02-08 20:50:44,839 INFO
>> > > > org.apache.sqoop.mapreduce.AutoProgressMapper:
>> > > > > Auto-progress thread is finished. keepGoing=false
>> > > > > 2014-02-08 20:50:44,842 INFO
>> > > org.apache.hadoop.mapred.TaskLogsTruncater:
>> > > > > Initializing logs' truncater with mapRetainSize=-1 and
>> > > > reduceRetainSize=-1
>> > > > > 2014-02-08 20:50:44,858 ERROR
>> > > > > org.apache.hadoop.security.UserGroupInformation:
>> > > > PriviledgedActionException
>> > > > > as:hive (auth:SIMPLE)
>> > > > >
>> > > >
>> > >
>> >
>> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
>> > > > > Failed 748 actions: NotServingRegionException*: 748 times,
>> > > > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
>> > > > running
>> > > > > child
>> > > > >
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> > > > Failed
>> > > > > 748 actions: NotServingRegionException: 748 times,
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
>> > > > >
>> > > > >
>> > > > > *2014-02-08 20:43:57,382 INFO
>> > > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
>> > > > > ts=1391921037382, server=null} to
>> {010c1981882d1a59201af5e2dc589d44
>> > > > > state=SPLITTING_NEW*, ts=1391921037382, server=
>> hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473}
>> > > > > 2014-02-08 20:43:57,382 INFO
>> > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
>> > > > > ts=1391921037382, server=null} to
>> {c2eb9b7971ca7f3fed3da86df5b788e7
>> > > > > state=SPLITTING_NEW, ts=1391921037382, server=
>> hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473}
>> > > > > 2014-02-08 20:43:57,382 INFO
>> > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
>> > > > > ts=1391918522959, server=hdtest208.svl.ibm.com
>> ,60020,1391887547473}
>> > to
>> > > > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
>> ts=1391921037382,
>> > > > > server=
>> > > > > hdtest208.svl.ibm.com,60020,1391887547473}
>> > > > >
>> > > > >
>> > > > > *2014-02-08 21:41:14,093 INFO
>> > > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Transitioned {010c1981882d1a59201af5e2dc589d44
>> > > > > state=SPLITTING_NEW,*ts=1391924474093, server=
>> > > > > hdtest208.svl.ibm.com,60020,1391887547473} to
>> > > > > {*010c1981882d1a59201af5e2dc589d44
>> > > > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473}
>> > > > > 2014-02-08 21:41:14,093 INFO
>> > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473
>> > > > > 2014-02-08 21:41:14,093 INFO
>> > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7
>> state=SPLITTING_NEW,
>> > > > > ts=1391924474093, server=hdtest208.svl.ibm.com
>> ,60020,1391887547473}
>> > to
>> > > > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
>> > server=
>> > > > > hdtest208.svl.ibm.com,60020,1391887547473}
>> > > > > 2014-02-08 21:41:14,094 INFO
>> > > org.apache.hadoop.hbase.master.RegionStates:
>> > > > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
>> > > > > ,60020,1391887547473
>> > > > > 2014-02-08 21:41:14,100 INFO
>> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT
>> > event;
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
>> > > > > daughter
>> > > > >
>> > > >
>> > >
>> >
>> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
>> > > > > daughter
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
>> > > > > on hdtest208.svl.ibm.com,60020,1391887547473
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Jerry He <je...@gmail.com>.
Hi, Ted

The related time period on the region sever is:

2014-02-08 20:44:12,662 WARN
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file
not found for region: 010c1981882d1a59201af5e2dc589d44
2014-02-08 20:44:12,666 WARN
org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file
not found for region: c2eb9b7971ca7f3fed3da86df5b788e7

There were no INFO related to these two regions until: (at the end see
this: *Split took 57mins, 16sec)*

2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355
2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354
2014-02-08 21:41:14,032 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
for
region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row
tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
with server=hdtest208.svl.ibm.com,60020,1391887547473
2014-02-08 21:41:14,054 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open
deploy task for
tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
2014-02-08 21:41:14,054 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
for
region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
Completed compaction of 10 file(s) in cf of
tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.
into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is
2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to
execute.
2014-02-08 21:41:14,059 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed
compaction: Request =
regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c.,
storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883,
time=1391924373278861000; duration=1mins, 40sec
2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on cf in region
tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore:
Starting compaction of 10 file(s) in cf of
tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.
into
tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp,
totalSize=709.7 M
2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row
tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
with server=hdtest208.svl.ibm.com,60020,1391887547473
2014-02-08 21:41:14,066 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open
deploy task for
tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.
2014-02-08 21:41:14,190 INFO
org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta
updated, and report to master.
Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
new regions:
tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7..
*Split took 57mins, 16sec*




On Sun, Feb 9, 2014 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you take a look at the server log for hdtest208.svl.ibm.com during
> this
> period of time ?
>
> Thanks
>
>
> On Sun, Feb 9, 2014 at 3:23 PM, Jerry He <je...@gmail.com> wrote:
>
> > Hi, Ted, Jimmy
> >
> > There were not much activities showing up in the master log during the
> time
> > between.   (The BackupLogCleaner is our log cleaner plugin that only
> checks
> > to see if the logs can be deleted)
> > There is no Info at all from
> > *2014-02-08 20:59:48,659*
> > *2014-02-08 21:21:47,445*
> >
> > 2014-02-08 20:35:55,910 INFO
> org.apache.hadoop.hbase.master.CatalogJanitor:
> > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2
> > unreferenced parent region(s)
> > *2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> > server=
> > hdtest208.svl.ibm.com,60020,1391887547473}
> > 2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=hdtest207.svl.ibm.com:2181,
> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
> > watcher=Backup Client
> > 2014-02-08 20:46:47,893 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=Backup Client connecting to ZooKeeper ensemble=
> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> > hdtest205.svl.ibm.com:2181
> > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181.
> Will
> > not attempt to authenticate using SASL (Unable to locate a login
> > configuration)
> > 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
> > initiating session
> > 2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181
> ,
> > sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000
> > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x24412bcbe5d0684 closed
> > 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> > Deleted
> > tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c.
> > 2014-02-08 20:50:57,955 INFO
> org.apache.hadoop.hbase.master.CatalogJanitor:
> > Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
> > unreferenced parent region(s)
> > 2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=hdtest207.svl.ibm.com:2181,
> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
> > watcher=Backup Client
> > 2014-02-08 20:59:47,626 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=Backup Client connecting to ZooKeeper ensemble=
> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> > hdtest205.svl.ibm.com:2181
> > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181.
> Will
> > not attempt to authenticate using SASL (Unable to locate a login
> > configuration)
> > 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to hdtest201.svl.ibm.com/9.30.194.210:2181,
> > initiating session
> > 2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server hdtest201.svl.ibm.com/9.30.194.210:2181
> ,
> > sessionid = 0x34412bcbe93070c, negotiated timeout = 60000
> > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x34412bcbe93070c closed
> > 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2014-02-08 20:59:47,641 INFO
> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
> > %2C60020%2C1391887547473.1391919916110
> > 2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=hdtest207.svl.ibm.com:2181,
> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
> > watcher=Backup Client
> > 2014-02-08 20:59:48,644 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=Backup Client connecting to ZooKeeper ensemble=
> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> > hdtest205.svl.ibm.com:2181
> > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181.
> Will
> > not attempt to authenticate using SASL (Unable to locate a login
> > configuration)
> > 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
> > initiating session
> > 2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181
> ,
> > sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000
> > 2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x24412bcbe5d06c9 closed
> > 2014-02-08 20:59:48,659 INFO
> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> > gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
> > %2C60020%2C1391887547473.1391920146024
> > *2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > shut down
> > *2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=hdtest207.svl.ibm.com:2181,
> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
> > watcher=Backup Client
> > 2014-02-08 21:21:47,446 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=Backup Client connecting to ZooKeeper ensemble=
> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> > hdtest205.svl.ibm.com:2181
> > 2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181.
> Will
> > not attempt to authenticate using SASL (Unable to locate a login
> > configuration)
> > 2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
> > initiating session
> > 2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181
> ,
> > sessionid = 0x14412bcbe630833, negotiated timeout = 60000
> > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x14412bcbe630833 closed
> > 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=hdtest207.svl.ibm.com:2181,
> > hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181sessionTimeout=90000
> > watcher=Backup Client
> > 2014-02-08 21:39:47,561 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=Backup Client connecting to ZooKeeper ensemble=
> > hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> > hdtest205.svl.ibm.com:2181
> > 2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181.
> Will
> > not attempt to authenticate using SASL (Unable to locate a login
> > configuration)
> > 2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
> > initiating session
> > 2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181
> ,
> > sessionid = 0x14412bcbe630880, negotiated timeout = 60000
> > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x14412bcbe630880 closed
> > 2014-02-08 21:39:47,577 INFO
> > com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> > gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com
> > %2C60020%2C1391887547484.1391917561641
> > 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > *2014-02-08 21:41*:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
> > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > {b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server=
> > hdtest208.svl.ibm.com,60020,1391887547473}
> > 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com
> > ,60020,1391887547473
> > 2*014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW,
> > ts=1391924474093, server=hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
> > {010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093,
> > server=hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> > *2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473*
> > *2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > ts=1391924474093, server=hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
> > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
> > server=hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> > *2014-02-08 21:41:14,094 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > <http://hdtest208.svl.ibm.com>,60020,1391887547473*
> > *2014-02-08 21:41:14,100 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
> >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > daughter
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > daughter
> >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com
> > >,60020,1391887547473*
> > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE,
> > ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6
> > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE,
> > ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3
> > state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN,
> > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > {1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283,
> > server=
> > hdtest208.svl.ibm.com,60020,1391887547473}
> >
> >
> >
> > On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
> >
> > > Was the master very busy? How many regions were in transition during
> that
> > > period?
> > >
> > >
> > > On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get
> > more
> > > > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
> > > > 010c1981882d1a59201af5e2dc589d44. ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com> wrote:
> > > >
> > > > > Hi, folks
> > > > >
> > > > > This is what I am seeing when running in a stress env.   I am
> getting
> > > > > "RetriesExhaustedWithDetailsException: Failed 748 actions:
> > > > > NotServingRegionException"
> > > > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE
> > to *
> > > > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from
> > > **SPLITTING_NEW
> > > > to
> > > > > OPEN.*
> > > > > Am I seeing anything wrong here?
> > > > >
> > > > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group
> > > FileSystemCounters
> > > > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter
> > > instead
> > > > > *2014-02-08 20:50:44,824* WARN
> > > > org.apache.hadoop.hbase.client.AsyncProcess:
> > > > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473
> > > > > NOT resubmitting.
> > > > >
> > > > >
> > > >
> > >
> >
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> > > > > 2014-02-08 20:50:44,839 INFO
> > > > org.apache.sqoop.mapreduce.AutoProgressMapper:
> > > > > Auto-progress thread is finished. keepGoing=false
> > > > > 2014-02-08 20:50:44,842 INFO
> > > org.apache.hadoop.mapred.TaskLogsTruncater:
> > > > > Initializing logs' truncater with mapRetainSize=-1 and
> > > > reduceRetainSize=-1
> > > > > 2014-02-08 20:50:44,858 ERROR
> > > > > org.apache.hadoop.security.UserGroupInformation:
> > > > PriviledgedActionException
> > > > > as:hive (auth:SIMPLE)
> > > > >
> > > >
> > >
> >
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> > > > > Failed 748 actions: NotServingRegionException*: 748 times,
> > > > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
> > > > running
> > > > > child
> > > > >
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > Failed
> > > > > 748 actions: NotServingRegionException: 748 times,
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> > > > >
> > > > >
> > > > > *2014-02-08 20:43:57,382 INFO
> > > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > > > > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > > > > state=SPLITTING_NEW*, ts=1391921037382, server=
> hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473}
> > > > > 2014-02-08 20:43:57,382 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > > > > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > > > > state=SPLITTING_NEW, ts=1391921037382, server=
> hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473}
> > > > > 2014-02-08 20:43:57,382 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > > > > ts=1391918522959, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> > to
> > > > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
> ts=1391921037382,
> > > > > server=
> > > > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > > >
> > > > >
> > > > > *2014-02-08 21:41:14,093 INFO
> > > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Transitioned {010c1981882d1a59201af5e2dc589d44
> > > > > state=SPLITTING_NEW,*ts=1391924474093, server=
> > > > > hdtest208.svl.ibm.com,60020,1391887547473} to
> > > > > {*010c1981882d1a59201af5e2dc589d44
> > > > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473}
> > > > > 2014-02-08 21:41:14,093 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473
> > > > > 2014-02-08 21:41:14,093 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > > > > ts=1391924474093, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> > to
> > > > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
> > server=
> > > > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > > > 2014-02-08 21:41:14,094 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > > > > ,60020,1391887547473
> > > > > 2014-02-08 21:41:14,100 INFO
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT
> > event;
> > > > >
> > > > >
> > > >
> > >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > > > > daughter
> > > > >
> > > >
> > >
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > > > daughter
> > > > >
> > > > >
> > > >
> > >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > > > > on hdtest208.svl.ibm.com,60020,1391887547473
> > > > >
> > > >
> > >
> >
>

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Ted Yu <yu...@gmail.com>.
Can you take a look at the server log for hdtest208.svl.ibm.com during this
period of time ?

Thanks


On Sun, Feb 9, 2014 at 3:23 PM, Jerry He <je...@gmail.com> wrote:

> Hi, Ted, Jimmy
>
> There were not much activities showing up in the master log during the time
> between.   (The BackupLogCleaner is our log cleaner plugin that only checks
> to see if the logs can be deleted)
> There is no Info at all from
> *2014-02-08 20:59:48,659*
> *2014-02-08 21:21:47,445*
>
> 2014-02-08 20:35:55,910 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
> Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2
> unreferenced parent region(s)
> *2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> server=
> hdtest208.svl.ibm.com,60020,1391887547473}
> 2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hdtest207.svl.ibm.com:2181,
> hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
> watcher=Backup Client
> 2014-02-08 20:46:47,893 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=Backup Client connecting to ZooKeeper ensemble=
> hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> hdtest205.svl.ibm.com:2181
> 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
> not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
> initiating session
> 2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
> sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000
> 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24412bcbe5d0684 closed
> 2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted
> tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c.
> 2014-02-08 20:50:57,955 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
> Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
> unreferenced parent region(s)
> 2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hdtest207.svl.ibm.com:2181,
> hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
> watcher=Backup Client
> 2014-02-08 20:59:47,626 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=Backup Client connecting to ZooKeeper ensemble=
> hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> hdtest205.svl.ibm.com:2181
> 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181. Will
> not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hdtest201.svl.ibm.com/9.30.194.210:2181,
> initiating session
> 2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hdtest201.svl.ibm.com/9.30.194.210:2181,
> sessionid = 0x34412bcbe93070c, negotiated timeout = 60000
> 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x34412bcbe93070c closed
> 2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-02-08 20:59:47,641 INFO
> com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
> %2C60020%2C1391887547473.1391919916110
> 2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hdtest207.svl.ibm.com:2181,
> hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
> watcher=Backup Client
> 2014-02-08 20:59:48,644 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=Backup Client connecting to ZooKeeper ensemble=
> hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> hdtest205.svl.ibm.com:2181
> 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
> not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
> initiating session
> 2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
> sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000
> 2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24412bcbe5d06c9 closed
> 2014-02-08 20:59:48,659 INFO
> com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
> %2C60020%2C1391887547473.1391920146024
> *2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> *2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hdtest207.svl.ibm.com:2181,
> hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
> watcher=Backup Client
> 2014-02-08 21:21:47,446 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=Backup Client connecting to ZooKeeper ensemble=
> hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> hdtest205.svl.ibm.com:2181
> 2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
> not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
> initiating session
> 2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
> sessionid = 0x14412bcbe630833, negotiated timeout = 60000
> 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14412bcbe630833 closed
> 2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hdtest207.svl.ibm.com:2181,
> hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
> watcher=Backup Client
> 2014-02-08 21:39:47,561 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=Backup Client connecting to ZooKeeper ensemble=
> hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
> hdtest205.svl.ibm.com:2181
> 2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
> not attempt to authenticate using SASL (Unable to locate a login
> configuration)
> 2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
> initiating session
> 2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
> sessionid = 0x14412bcbe630880, negotiated timeout = 60000
> 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14412bcbe630880 closed
> 2014-02-08 21:39:47,577 INFO
> com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
> gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com
> %2C60020%2C1391887547484.1391917561641
> 2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> *2014-02-08 21:41*:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
> ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> {b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server=
> hdtest208.svl.ibm.com,60020,1391887547473}
> 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com
> ,60020,1391887547473
> 2*014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW,
> ts=1391924474093, server=hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
> {010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093,
> server=hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> *2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473*
> *2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> ts=1391924474093, server=hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473} to
> {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
> server=hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473}*
> *2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
> Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> <http://hdtest208.svl.ibm.com>,60020,1391887547473*
> *2014-02-08 21:41:14,100 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
>
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> daughter
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> daughter
>
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com
> >,60020,1391887547473*
> 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE,
> ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6
> state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE,
> ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3
> state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN,
> ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> {1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283,
> server=
> hdtest208.svl.ibm.com,60020,1391887547473}
>
>
>
> On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
>
> > Was the master very busy? How many regions were in transition during that
> > period?
> >
> >
> > On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get
> more
> > > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
> > > 010c1981882d1a59201af5e2dc589d44. ?
> > >
> > > Cheers
> > >
> > >
> > > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com> wrote:
> > >
> > > > Hi, folks
> > > >
> > > > This is what I am seeing when running in a stress env.   I am getting
> > > > "RetriesExhaustedWithDetailsException: Failed 748 actions:
> > > > NotServingRegionException"
> > > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE
> to *
> > > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from
> > **SPLITTING_NEW
> > > to
> > > > OPEN.*
> > > > Am I seeing anything wrong here?
> > > >
> > > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group
> > FileSystemCounters
> > > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter
> > instead
> > > > *2014-02-08 20:50:44,824* WARN
> > > org.apache.hadoop.hbase.client.AsyncProcess:
> > > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> > > > ,60020,1391887547473
> > > > NOT resubmitting.
> > > >
> > > >
> > >
> >
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> > > > 2014-02-08 20:50:44,839 INFO
> > > org.apache.sqoop.mapreduce.AutoProgressMapper:
> > > > Auto-progress thread is finished. keepGoing=false
> > > > 2014-02-08 20:50:44,842 INFO
> > org.apache.hadoop.mapred.TaskLogsTruncater:
> > > > Initializing logs' truncater with mapRetainSize=-1 and
> > > reduceRetainSize=-1
> > > > 2014-02-08 20:50:44,858 ERROR
> > > > org.apache.hadoop.security.UserGroupInformation:
> > > PriviledgedActionException
> > > > as:hive (auth:SIMPLE)
> > > >
> > >
> >
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> > > > Failed 748 actions: NotServingRegionException*: 748 times,
> > > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
> > > running
> > > > child
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > Failed
> > > > 748 actions: NotServingRegionException: 748 times,
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> > > >
> > > >
> > > > *2014-02-08 20:43:57,382 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > > > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > > > state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > > ,60020,1391887547473}
> > > > 2014-02-08 20:43:57,382 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > > > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > > > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > > ,60020,1391887547473}
> > > > 2014-02-08 20:43:57,382 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > > > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473}
> to
> > > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> > > > server=
> > > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > >
> > > >
> > > > *2014-02-08 21:41:14,093 INFO
> > > org.apache.hadoop.hbase.master.RegionStates:
> > > > Transitioned {010c1981882d1a59201af5e2dc589d44
> > > > state=SPLITTING_NEW,*ts=1391924474093, server=
> > > > hdtest208.svl.ibm.com,60020,1391887547473} to
> > > > {*010c1981882d1a59201af5e2dc589d44
> > > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> > > > ,60020,1391887547473}
> > > > 2014-02-08 21:41:14,093 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > > > ,60020,1391887547473
> > > > 2014-02-08 21:41:14,093 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > > > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473}
> to
> > > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
> server=
> > > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > > 2014-02-08 21:41:14,094 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > > > ,60020,1391887547473
> > > > 2014-02-08 21:41:14,100 INFO
> > > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT
> event;
> > > >
> > > >
> > >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > > > daughter
> > > >
> > >
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > > daughter
> > > >
> > > >
> > >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > > > on hdtest208.svl.ibm.com,60020,1391887547473
> > > >
> > >
> >
>

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Jerry He <je...@gmail.com>.
Hi, Ted, Jimmy

There were not much activities showing up in the master log during the time
between.   (The BackupLogCleaner is our log cleaner plugin that only checks
to see if the logs can be deleted)
There is no Info at all from
*2014-02-08 20:59:48,659*
*2014-02-08 21:21:47,445*

2014-02-08 20:35:55,910 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 2
unreferenced parent region(s)
*2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382, server=
hdtest208.svl.ibm.com,60020,1391887547473}
2014-02-08 20:46:47,892 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:46:47,893 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:46:47,894 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
initiating session
2014-02-08 20:46:47,910 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
sessionid = 0x24412bcbe5d0684, negotiated timeout = 60000
2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ZooKeeper: Session:
0x24412bcbe5d0684 closed
2014-02-08 20:46:47,919 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 20:50:56,276 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted
tpch_hb_1000_2.lineitem,,1391914553504.de69c2df235d7307931b08206b2e270c.
2014-02-08 20:50:57,955 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 75 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
unreferenced parent region(s)
2014-02-08 20:59:47,626 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:59:47,626 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest201.svl.ibm.com/9.30.194.210:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:59:47,627 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest201.svl.ibm.com/9.30.194.210:2181,
initiating session
2014-02-08 20:59:47,633 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest201.svl.ibm.com/9.30.194.210:2181,
sessionid = 0x34412bcbe93070c, negotiated timeout = 60000
2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ZooKeeper: Session:
0x34412bcbe93070c closed
2014-02-08 20:59:47,641 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 20:59:47,641 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
%2C60020%2C1391887547473.1391919916110
2014-02-08 20:59:48,643 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 20:59:48,644 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest205.svl.ibm.com/9.30.194.214:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 20:59:48,645 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest205.svl.ibm.com/9.30.194.214:2181,
initiating session
2014-02-08 20:59:48,651 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest205.svl.ibm.com/9.30.194.214:2181,
sessionid = 0x24412bcbe5d06c9, negotiated timeout = 60000
2014-02-08 20:59:48,659 INFO org.apache.zookeeper.ZooKeeper: Session:
0x24412bcbe5d06c9 closed
2014-02-08 20:59:48,659 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest208.svl.ibm.com
%2C60020%2C1391887547473.1391920146024
*2014-02-08 20:59:48,659* INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
*2014-02-08 21:21:47,445* INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 21:21:47,446 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 21:21:47,446 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 21:21:47,447 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
initiating session
2014-02-08 21:21:47,470 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
sessionid = 0x14412bcbe630833, negotiated timeout = 60000
2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14412bcbe630833 closed
2014-02-08 21:21:47,479 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-02-08 21:39:47,560 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hdtest207.svl.ibm.com:2181,
hdtest201.svl.ibm.com:2181,hdtest205.svl.ibm.com:2181 sessionTimeout=90000
watcher=Backup Client
2014-02-08 21:39:47,561 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=Backup Client connecting to ZooKeeper ensemble=
hdtest207.svl.ibm.com:2181,hdtest201.svl.ibm.com:2181,
hdtest205.svl.ibm.com:2181
2014-02-08 21:39:47,561 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hdtest207.svl.ibm.com/9.30.194.216:2181. Will
not attempt to authenticate using SASL (Unable to locate a login
configuration)
2014-02-08 21:39:47,562 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hdtest207.svl.ibm.com/9.30.194.216:2181,
initiating session
2014-02-08 21:39:47,569 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hdtest207.svl.ibm.com/9.30.194.216:2181,
sessionid = 0x14412bcbe630880, negotiated timeout = 60000
2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14412bcbe630880 closed
2014-02-08 21:39:47,577 INFO
com.ibm.biginsights.hbase.backup.BackupLogCleaner: ok to delete:
gpfs:/hbase/oldWALs/hdtest203.svl.ibm.com
%2C60020%2C1391887547484.1391917561641
2014-02-08 21:39:47,577 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
*2014-02-08 21:41*:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {b576e8db65d56ec08db5ca900587c28d state=SPLITTING,
ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{b576e8db65d56ec08db5ca900587c28d state=SPLIT, ts=1391924474093, server=
hdtest208.svl.ibm.com,60020,1391887547473}
2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Offlined b576e8db65d56ec08db5ca900587c28d from hdtest208.svl.ibm.com
,60020,1391887547473
2*014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {010c1981882d1a59201af5e2dc589d44 state=SPLITTING_NEW,
ts=1391924474093, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473} to
{010c1981882d1a59201af5e2dc589d44 state=OPEN, ts=1391924474093,
server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
*2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473*
*2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
ts=1391924474093, server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473} to
{c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093,
server=hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473}*
*2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
<http://hdtest208.svl.ibm.com>,60020,1391887547473*
*2014-02-08 21:41:14,100 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
daughter
a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
daughter
b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
on hdtest208.svl.ibm.com <http://hdtest208.svl.ibm.com>,60020,1391887547473*
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {4ae7fe11b10dde440d1bdcaf5346edd6 state=OFFLINE,
ts=1391924474283, server=null} to {4ae7fe11b10dde440d1bdcaf5346edd6
state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {d80ed7b81921c083dea41044b2ee9ab3 state=OFFLINE,
ts=1391924474283, server=null} to {d80ed7b81921c083dea41044b2ee9ab3
state=SPLITTING_NEW, ts=1391924474283, server=hdtest208.svl.ibm.com
,60020,1391887547473}
2014-02-08 21:41:14,283 INFO org.apache.hadoop.hbase.master.RegionStates:
Transitioned {1fbcfc0a792435dfd73ec5b0ef5c953c state=OPEN,
ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
{1fbcfc0a792435dfd73ec5b0ef5c953c state=SPLITTING, ts=1391924474283, server=
hdtest208.svl.ibm.com,60020,1391887547473}



On Sun, Feb 9, 2014 at 2:33 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> Was the master very busy? How many regions were in transition during that
> period?
>
>
> On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you pastebin master log from 20:40:00 to 21:42:00 so that we get more
> > context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
> > 010c1981882d1a59201af5e2dc589d44. ?
> >
> > Cheers
> >
> >
> > On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com> wrote:
> >
> > > Hi, folks
> > >
> > > This is what I am seeing when running in a stress env.   I am getting
> > > "RetriesExhaustedWithDetailsException: Failed 748 actions:
> > > NotServingRegionException"
> > > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE to *
> > > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from
> **SPLITTING_NEW
> > to
> > > OPEN.*
> > > Am I seeing anything wrong here?
> > >
> > > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group
> FileSystemCounters
> > > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter
> instead
> > > *2014-02-08 20:50:44,824* WARN
> > org.apache.hadoop.hbase.client.AsyncProcess:
> > > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > NOT resubmitting.
> > >
> > >
> >
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> > > 2014-02-08 20:50:44,839 INFO
> > org.apache.sqoop.mapreduce.AutoProgressMapper:
> > > Auto-progress thread is finished. keepGoing=false
> > > 2014-02-08 20:50:44,842 INFO
> org.apache.hadoop.mapred.TaskLogsTruncater:
> > > Initializing logs' truncater with mapRetainSize=-1 and
> > reduceRetainSize=-1
> > > 2014-02-08 20:50:44,858 ERROR
> > > org.apache.hadoop.security.UserGroupInformation:
> > PriviledgedActionException
> > > as:hive (auth:SIMPLE)
> > >
> >
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> > > Failed 748 actions: NotServingRegionException*: 748 times,
> > > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
> > running
> > > child
> > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > Failed
> > > 748 actions: NotServingRegionException: 748 times,
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> > >
> > >
> > > *2014-02-08 20:43:57,382 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > > state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> > > server=
> > > hdtest208.svl.ibm.com,60020,1391887547473}
> > >
> > >
> > > *2014-02-08 21:41:14,093 INFO
> > org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {010c1981882d1a59201af5e2dc589d44
> > > state=SPLITTING_NEW,*ts=1391924474093, server=
> > > hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {*010c1981882d1a59201af5e2dc589d44
> > > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> > > ,60020,1391887547473}
> > > 2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > 2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, server=
> > > hdtest208.svl.ibm.com,60020,1391887547473}
> > > 2014-02-08 21:41:14,094 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > > ,60020,1391887547473
> > > 2014-02-08 21:41:14,100 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
> > >
> > >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > > daughter
> > >
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > > daughter
> > >
> > >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > > on hdtest208.svl.ibm.com,60020,1391887547473
> > >
> >
>

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Jimmy Xiang <jx...@cloudera.com>.
Was the master very busy? How many regions were in transition during that
period?


On Sun, Feb 9, 2014 at 2:31 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you pastebin master log from 20:40:00 to 21:42:00 so that we get more
> context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
> 010c1981882d1a59201af5e2dc589d44. ?
>
> Cheers
>
>
> On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com> wrote:
>
> > Hi, folks
> >
> > This is what I am seeing when running in a stress env.   I am getting
> > "RetriesExhaustedWithDetailsException: Failed 748 actions:
> > NotServingRegionException"
> > On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE to *
> > *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from **SPLITTING_NEW
> to
> > OPEN.*
> > Am I seeing anything wrong here?
> >
> > 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group FileSystemCounters
> > is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> > *2014-02-08 20:50:44,824* WARN
> org.apache.hadoop.hbase.client.AsyncProcess:
> > Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> > ,60020,1391887547473
> > NOT resubmitting.
> >
> >
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> > 2014-02-08 20:50:44,839 INFO
> org.apache.sqoop.mapreduce.AutoProgressMapper:
> > Auto-progress thread is finished. keepGoing=false
> > 2014-02-08 20:50:44,842 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
> > Initializing logs' truncater with mapRetainSize=-1 and
> reduceRetainSize=-1
> > 2014-02-08 20:50:44,858 ERROR
> > org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException
> > as:hive (auth:SIMPLE)
> >
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> > Failed 748 actions: NotServingRegionException*: 748 times,
> > 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error
> running
> > child
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> > 748 actions: NotServingRegionException: 748 times,
> > at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> > at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> > at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> >
> >
> > *2014-02-08 20:43:57,382 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> > ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> > state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> > ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> > state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> > ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> > server=
> > hdtest208.svl.ibm.com,60020,1391887547473}
> >
> >
> > *2014-02-08 21:41:14,093 INFO
> org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {010c1981882d1a59201af5e2dc589d44
> > state=SPLITTING_NEW,*ts=1391924474093, server=
> > hdtest208.svl.ibm.com,60020,1391887547473} to
> > {*010c1981882d1a59201af5e2dc589d44
> > state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> > ,60020,1391887547473}
> > 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> > ,60020,1391887547473
> > 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> > ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> > {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, server=
> > hdtest208.svl.ibm.com,60020,1391887547473}
> > 2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
> > Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> > ,60020,1391887547473
> > 2014-02-08 21:41:14,100 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
> >
> >
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> > daughter
> >
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> > daughter
> >
> >
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> > on hdtest208.svl.ibm.com,60020,1391887547473
> >
>

Re: taking a hour to get a spitted region online? 0.96.0

Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin master log from 20:40:00 to 21:42:00 so that we get more
context on what happened to tpch_hb_1000_2.lineitem,,1391921037353.
010c1981882d1a59201af5e2dc589d44. ?

Cheers


On Sun, Feb 9, 2014 at 2:12 PM, Jerry He <je...@gmail.com> wrote:

> Hi, folks
>
> This is what I am seeing when running in a stress env.   I am getting
> "RetriesExhaustedWithDetailsException: Failed 748 actions:
> NotServingRegionException"
> On the master log, *2014-02-08 20:43 is the timestamp from OFFLINE to *
> *SPLITTING_NEW*, *2014-02-08 21:41 is the timestamp from **SPLITTING_NEW to
> OPEN.*
> Am I seeing anything wrong here?
>
> 2014-02-08 20:45:53,215 WARN mapreduce.Counters: Group FileSystemCounters
> is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> *2014-02-08 20:50:44,824* WARN org.apache.hadoop.hbase.client.AsyncProcess:
> Attempt #35/35 failed for 748 ops on hdtest208.svl.ibm.com
> ,60020,1391887547473
> NOT resubmitting.
>
> region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> hostname=hdtest208.svl.ibm.com,60020,1391887547473, seqNum=1
> 2014-02-08 20:50:44,839 INFO org.apache.sqoop.mapreduce.AutoProgressMapper:
> Auto-progress thread is finished. keepGoing=false
> 2014-02-08 20:50:44,842 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2014-02-08 20:50:44,858 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hive (auth:SIMPLE)
> cause:org.apache.hadoop.hbase.client.*RetriesExhaustedWithDetailsException:
> Failed 748 actions: NotServingRegionException*: 748 times,
> 2014-02-08 20:50:44,858 WARN org.apache.hadoop.mapred.Child: Error running
> child
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 748 actions: NotServingRegionException: 748 times,
> at
>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> at
>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> at
>
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> at
>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
>
>
> *2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {010c1981882d1a59201af5e2dc589d44 state=OFFLINE,
> ts=1391921037382, server=null} to {010c1981882d1a59201af5e2dc589d44
> state=SPLITTING_NEW*, ts=1391921037382, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=OFFLINE,
> ts=1391921037382, server=null} to {c2eb9b7971ca7f3fed3da86df5b788e7
> state=SPLITTING_NEW, ts=1391921037382, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 20:43:57,382 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {b576e8db65d56ec08db5ca900587c28d state=OPEN,
> ts=1391918522959, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> {b576e8db65d56ec08db5ca900587c28d state=SPLITTING, ts=1391921037382,
> server=
> hdtest208.svl.ibm.com,60020,1391887547473}
>
>
> *2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {010c1981882d1a59201af5e2dc589d44
> state=SPLITTING_NEW,*ts=1391924474093, server=
> hdtest208.svl.ibm.com,60020,1391887547473} to
> {*010c1981882d1a59201af5e2dc589d44
> state=OPEN*, ts=1391924474093, server=hdtest208.svl.ibm.com
> ,60020,1391887547473}
> 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Onlined 010c1981882d1a59201af5e2dc589d44 on hdtest208.svl.ibm.com
> ,60020,1391887547473
> 2014-02-08 21:41:14,093 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transitioned {c2eb9b7971ca7f3fed3da86df5b788e7 state=SPLITTING_NEW,
> ts=1391924474093, server=hdtest208.svl.ibm.com,60020,1391887547473} to
> {c2eb9b7971ca7f3fed3da86df5b788e7 state=OPEN, ts=1391924474093, server=
> hdtest208.svl.ibm.com,60020,1391887547473}
> 2014-02-08 21:41:14,094 INFO org.apache.hadoop.hbase.master.RegionStates:
> Onlined c2eb9b7971ca7f3fed3da86df5b788e7 on hdtest208.svl.ibm.com
> ,60020,1391887547473
> 2014-02-08 21:41:14,100 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Handled SPLIT event;
>
> parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d.,
> daughter
> a=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44.,
> daughter
>
> b=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.,
> on hdtest208.svl.ibm.com,60020,1391887547473
>