You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by schubert zhang <zs...@gmail.com> on 2009/03/17 08:38:49 UTC

Tow issues for sharing, when using MapReduce to insert rows to HBase

Hi all,

I am running a MapReduce Job to read files and insert rows into a
HBase table. It like a ETL procedure in database world.

Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server
with 4GB memory and 1TB disk on each node.

Issue 1.

Each time the MapReduce Job eat some files. And I want to get a
balance in this pipe-line (I have defined the HBase table with TTL)
since many new incoming files need to be eat. But sometimes many
following RetriesExhaustedException happen and the job is blocked for
a long time. And the the MapReduce job will be behindhand and cannot
catch up with the speed of incoming files.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server Some server for region
TABA,113976305209@2009-03-17 13:11:04.135,1237268430402, row
'113976399906@2009-03-17 13:40:12.213', but failed after 10 attempts.

Exceptions:

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
	at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
	at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
	at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
	at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
	at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.Child.main(Child.java:158)

I check the code and find the retry policy when this exception is like
BSD TCP syn backoff policy:

  /**
   * This is a retry backoff multiplier table similar to the BSD TCP syn
   * backoff table, a bit more aggressive than simple exponential backoff.
   */
  public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32 };

So, 10 retries will table long time: 71*2=142 seconds. So I think following
tow parameters can be changed and I reduce them.
hbase.client.pause = 1 (second)
hbase.client.retries.number=4
Now, the block time should be 5 seconds.

In fact, sometimes I have met such exception also when:
getRegionServerWithRetries, and
getRegionLocationForRowWithRetries, and
processBatchOfRows

I want to know what cause above exception happen. Region spliting?
Compaction? or Region moving to other regionserver?
This exception happens very frequently in my cluster.

Issue2: Also above workflow, I found the regionserver nodes are very busy
wigh very heavy load.
The process of HRegionServer takes almost all of the used CPU load.
In my previous test round, I ran 4 maptask to insert data in each node, now
I just run 2 maptask in each node. I think the performance of HRegionServer
is not so good to achieve some of my applications. Because except for the
insert/input data procedure, we will query rows from the same table for
online applications, which need low-latency. I my test experience, I find
the online query (scan about 50 rows with startRow and endRow) latency is
2-6 seconds for the first query, and about 500ms for the re-query. And when
the cluster is busy for inserting data to table, the query latency may as
long as 10-30 seconds. And when the table is in sliping/compaction, maybe
longer.

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
I checked the logs of my mapreduce app job and the HMaster.

Almost immediately after the RetriesExhaustedException on mapreduce job,
there would be bulk/many of log info in the HMaster log.

For example:
09/03/20 02:17:08 INFO mapred.JobClient:  map 9% reduce 0%
09/03/20 02:17:08 INFO mapred.JobClient: Task Id :
attempt_200903191955_0098_m_000001_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region
TABA,13602027108@2009-03-1910:36:21.453,1237458164945, row
'13602035289@2009-03-1923:05:38.327', but failed after 6 attempts.
Exceptions:

        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(Unknown
Source)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(Unknown
Source)
        at org.apache.hadoop.hbase.client.HTable.commit(Unknown Source)
        at org.apache.hadoop.hbase.client.HTable.commit(Unknown Source)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
        at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(Unknown
Source)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:128)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
        at org.apache.hadoop.mapred.MapRunner.run(Unknown Source)
        at org.apache.hadoop.mapred.MapTask.run(Unknown Source)
        at org.apache.hadoop.mapred.Child.main(Unknown Source)


Then, there are many following HMaster log info:


2009-03-20 02:16:10,086 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-03-20 02:17:10,012 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.20:60020}
2009-03-20 02:17:10,054 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: 10.24.1.14:60020}
2009-03-20 02:17:10,063 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname:
-ROOT-,,0, startKey: <>, server: 10.24.1.14:60020} complete
2009-03-20 02:17:10,105 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 129 row(s) of meta region {regionname:
.META.,,1, startKey: <>, server: 10.24.1.20:60020} complete
2009-03-20 02:17:10,105 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-03-20 02:18:01,238 INFO org.apache.hadoop.hbase.master.ServerManager:
Received start message from: 10.24.1.18:60020
2009-03-20 02:18:02,223 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 0 region(s) that are in transition states
2009-03-20 02:18:02,912 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 0 region(s) that are in transition states
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13875589468@2009-03-1910:17:48.581,1237459173982: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13802060053@2009-03-1912:47:20.283,1237458108510: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13976353085@2009-03-1911:07:26.687,1237459144470: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13976388229@2009-03-1912:04:19.010,1237458234998: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13675544146@2009-03-1910:58:15.064,1237457576121: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13802071807@2009-03-1913:42:56.190,1237458108510: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13775572279@2009-03-1911:21:20.799,1237464844877: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13602097353@2009-03-1911:30:54.777,1237464464823: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,229 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13776377737@2009-03-1911:23:32.147,1237457693635: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:05,230 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 0 region(s) that are in transition states
2009-03-20 02:18:05,915 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13902041536@2009-03-1911:43:15.675,1237464933068: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,915 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13975523827@2009-03-1916:30:34.339,1237458918930: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,915 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13676349270@2009-03-1914:49:25.743,1237457885852: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,915 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13602073973@2009-03-1910:24:28.777,1237465040643: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,915 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13502069929@2009-03-1912:34:09.611,1237458221492: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,916 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13775584027@2009-03-1910:43:33.404,1237464844877: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,916 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13875507078@2009-03-1912:21:33.611,1237458356583: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,916 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:05,916 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13576345069@2009-03-1915:25:14.545,1237464702934: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:07,245 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13976353085@2009-03-19 11:07:26.687,1237459144470 to
server 10.24.1.18:60020
2009-03-20 02:18:07,246 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13875589468@2009-03-19 10:17:48.581,1237459173982 to
server 10.24.1.18:60020
2009-03-20 02:18:07,247 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13775572279@2009-03-19 11:21:20.799,1237464844877 to
server 10.24.1.18:60020
2009-03-20 02:18:07,247 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13776377737@2009-03-19 11:23:32.147,1237457693635 to
server 10.24.1.18:60020
2009-03-20 02:18:07,247 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13802071807@2009-03-19 13:42:56.190,1237458108510 to
server 10.24.1.18:60020
2009-03-20 02:18:08,234 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13575528220@2009-03-1912:13:07.415,1237458863754: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:08,234 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13902053262@2009-03-1910:26:08.452,1237464933068: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:08,234 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13876394981@2009-03-1912:03:57.912,1237458100402: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:08,234 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13702019399@2009-03-1912:27:42.240,1237464907805: safeMode=false
from
10.24.1.14:60020
2009-03-20 02:18:08,919 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:10,023 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.20:60020}
2009-03-20 02:18:10,065 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: 10.24.1.14:60020}
2009-03-20 02:18:10,072 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname:
-ROOT-,,0, startKey: <>, server: 10.24.1.14:60020} complete
2009-03-20 02:18:10,110 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 129 row(s) of meta region {regionname:
.META.,,1, startKey: <>, server: 10.24.1.20:60020} complete
2009-03-20 02:18:10,110 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13875589468@2009-03-1910:17:48.581,1237459173982: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13775572279@2009-03-1911:21:20.799,1237464844877: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776377737@2009-03-1911:23:32.147,1237457693635: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13802071807@2009-03-1913:42:56.190,1237458108510: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13976353085@2009-03-1911:07:26.687,1237459144470: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13875589468@2009-03-1910:17:48.581,1237459173982: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13775572279@2009-03-1911:21:20.799,1237464844877: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13976353085@2009-03-19 11:07:26.687,1237459144470 open on
10.24.1.18:60020
2009-03-20 02:18:10,516 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13976353085@2009-03-19 11:07:26.687,1237459144470 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:10,518 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13875589468@2009-03-19 10:17:48.581,1237459173982 open on
10.24.1.18:60020
2009-03-20 02:18:10,518 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13875589468@2009-03-19 10:17:48.581,1237459173982 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:10,519 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13775572279@2009-03-19 11:21:20.799,1237464844877 open on
10.24.1.18:60020
2009-03-20 02:18:10,519 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13775572279@2009-03-19 11:21:20.799,1237464844877 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:13,531 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13802071807@2009-03-1913:42:56.190,1237458108510: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:13,531 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13776377737@2009-03-1911:23:32.147,1237457693635: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:13,531 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13802071807@2009-03-1913:42:56.190,1237458108510: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:13,531 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13776377737@2009-03-19 11:23:32.147,1237457693635 open on
10.24.1.18:60020
2009-03-20 02:18:13,531 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13776377737@2009-03-19 11:23:32.147,1237457693635 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:13,531 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13876394981@2009-03-19 12:03:57.912,1237458100402 to
server 10.24.1.18:60020
2009-03-20 02:18:13,532 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13575528220@2009-03-19 12:13:07.415,1237458863754 to
server 10.24.1.18:60020
2009-03-20 02:18:13,532 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13976388229@2009-03-19 12:04:19.010,1237458234998 to
server 10.24.1.18:60020
2009-03-20 02:18:13,533 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13802060053@2009-03-19 12:47:20.283,1237458108510 to
server 10.24.1.18:60020
2009-03-20 02:18:13,533 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13675544146@2009-03-19 10:58:15.064,1237457576121 to
server 10.24.1.18:60020
2009-03-20 02:18:13,534 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13802071807@2009-03-19 13:42:56.190,1237458108510 open on
10.24.1.18:60020
2009-03-20 02:18:13,534 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13802071807@2009-03-19 13:42:56.190,1237458108510 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13575528220@2009-03-1912:13:07.415,1237458863754: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13976388229@2009-03-1912:04:19.010,1237458234998: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13802060053@2009-03-1912:47:20.283,1237458108510: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675544146@2009-03-1910:58:15.064,1237457576121: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13876394981@2009-03-1912:03:57.912,1237458100402: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13575528220@2009-03-1912:13:07.415,1237458863754: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13976388229@2009-03-1912:04:19.010,1237458234998: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13876394981@2009-03-19 12:03:57.912,1237458100402 open on
10.24.1.18:60020
2009-03-20 02:18:16,547 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13876394981@2009-03-19 12:03:57.912,1237458100402 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:16,549 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13575528220@2009-03-19 12:13:07.415,1237458863754 open on
10.24.1.18:60020
2009-03-20 02:18:16,549 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13575528220@2009-03-19 12:13:07.415,1237458863754 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:16,549 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13976388229@2009-03-19 12:04:19.010,1237458234998 open on
10.24.1.18:60020
2009-03-20 02:18:16,549 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13976388229@2009-03-19 12:04:19.010,1237458234998 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:19,557 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675544146@2009-03-1910:58:15.064,1237457576121: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:19,557 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13802060053@2009-03-1912:47:20.283,1237458108510: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:19,557 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13675544146@2009-03-1910:58:15.064,1237457576121: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:19,557 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13802060053@2009-03-19 12:47:20.283,1237458108510 open on
10.24.1.18:60020
2009-03-20 02:18:19,557 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13802060053@2009-03-19 12:47:20.283,1237458108510 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:19,557 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13602097353@2009-03-19 11:30:54.777,1237464464823 to
server 10.24.1.18:60020
2009-03-20 02:18:19,558 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13676349270@2009-03-19 14:49:25.743,1237457885852 to
server 10.24.1.18:60020
2009-03-20 02:18:19,559 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13702019399@2009-03-19 12:27:42.240,1237464907805 to
server 10.24.1.18:60020
2009-03-20 02:18:19,559 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13576345069@2009-03-19 15:25:14.545,1237464702934 to
server 10.24.1.18:60020
2009-03-20 02:18:19,560 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13675544146@2009-03-19 10:58:15.064,1237457576121 open on
10.24.1.18:60020
2009-03-20 02:18:19,560 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13675544146@2009-03-19 10:58:15.064,1237457576121 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:22,570 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13676349270@2009-03-1914:49:25.743,1237457885852: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13702019399@2009-03-1912:27:42.240,1237464907805: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576345069@2009-03-1915:25:14.545,1237464702934: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13602097353@2009-03-1911:30:54.777,1237464464823: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13676349270@2009-03-1914:49:25.743,1237457885852: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13602097353@2009-03-19 11:30:54.777,1237464464823 open on
10.24.1.18:60020
2009-03-20 02:18:22,571 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13602097353@2009-03-19 11:30:54.777,1237464464823 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:22,571 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13875507078@2009-03-19 12:21:33.611,1237458356583 to
server 10.24.1.18:60020
2009-03-20 02:18:22,572 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13902053262@2009-03-19 10:26:08.452,1237464933068 to
server 10.24.1.18:60020
2009-03-20 02:18:22,573 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13775584027@2009-03-19 10:43:33.404,1237464844877 to
server 10.24.1.18:60020
2009-03-20 02:18:22,573 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13975523827@2009-03-19 16:30:34.339,1237458918930 to
server 10.24.1.18:60020
2009-03-20 02:18:22,573 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13602073973@2009-03-19 10:24:28.777,1237465040643 to
server 10.24.1.18:60020
2009-03-20 02:18:22,574 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13902041536@2009-03-19 11:43:15.675,1237464933068 to
server 10.24.1.18:60020
2009-03-20 02:18:22,574 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13502069929@2009-03-19 12:34:09.611,1237458221492 to
server 10.24.1.18:60020
2009-03-20 02:18:22,575 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13776389501@2009-03-19 10:44:13.310,1237459140493 to
server 10.24.1.18:60020
2009-03-20 02:18:22,575 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13602027108@2009-03-19 10:36:21.453,1237458164945 to
server 10.24.1.18:60020
2009-03-20 02:18:22,575 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13502093230@2009-03-19 10:26:01.389,1237465300311 to
server 10.24.1.18:60020
2009-03-20 02:18:22,576 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13676349270@2009-03-19 14:49:25.743,1237457885852 open on
10.24.1.18:60020
2009-03-20 02:18:22,576 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13676349270@2009-03-19 14:49:25.743,1237457885852 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:23,457 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 0 region(s) that are in transition states
2009-03-20 02:18:23,935 INFO org.apache.hadoop.hbase.master.RegionManager:
Skipped 0 region(s) that are in transition states
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576345069@2009-03-1915:25:14.545,1237464702934: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13875507078@2009-03-1912:21:33.611,1237458356583: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902053262@2009-03-1910:26:08.452,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13775584027@2009-03-1910:43:33.404,1237464844877: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13975523827@2009-03-1916:30:34.339,1237458918930: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602073973@2009-03-1910:24:28.777,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902041536@2009-03-1911:43:15.675,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502069929@2009-03-1912:34:09.611,1237458221492: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,591 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13702019399@2009-03-1912:27:42.240,1237464907805: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13576345069@2009-03-1915:25:14.545,1237464702934: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13875507078@2009-03-1912:21:33.611,1237458356583: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13702019399@2009-03-19 12:27:42.240,1237464907805 open on
10.24.1.18:60020
2009-03-20 02:18:25,592 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13702019399@2009-03-19 12:27:42.240,1237464907805 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:25,593 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13576345069@2009-03-19 15:25:14.545,1237464702934 open on
10.24.1.18:60020
2009-03-20 02:18:25,593 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13576345069@2009-03-19 15:25:14.545,1237464702934 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:25,594 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13875507078@2009-03-19 12:21:33.611,1237458356583 open on
10.24.1.18:60020
2009-03-20 02:18:25,594 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13875507078@2009-03-19 12:21:33.611,1237458356583 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13576356781@2009-03-1912:25:38.416,1237464513437: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13775595768@2009-03-1911:21:53.890,1237458914764: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13702078070@2009-03-1910:14:49.190,1237458450828: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,463 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13676396003@2009-03-1911:46:08.485,1237464861537: safeMode=false
from
10.24.1.20:60020
2009-03-20 02:18:26,938 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13776342680@2009-03-1910:24:56.416,1237458469261: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:26,938 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13676337519@2009-03-1911:31:55.904,1237457885852: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:26,938 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:26,938 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13575586650@2009-03-1910:23:52.770,1237464888516: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:26,938 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13902088543@2009-03-1911:19:02.098,1237458838973: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:28,600 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13775584027@2009-03-1910:43:33.404,1237464844877: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13975523827@2009-03-1916:30:34.339,1237458918930: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602073973@2009-03-1910:24:28.777,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902041536@2009-03-1911:43:15.675,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502069929@2009-03-1912:34:09.611,1237458221492: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13902053262@2009-03-1910:26:08.452,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13775584027@2009-03-1910:43:33.404,1237464844877: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13902053262@2009-03-19 10:26:08.452,1237464933068 open on
10.24.1.18:60020
2009-03-20 02:18:28,601 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13902053262@2009-03-19 10:26:08.452,1237464933068 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:28,602 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13775584027@2009-03-19 10:43:33.404,1237464844877 open on
10.24.1.18:60020
2009-03-20 02:18:28,602 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13775584027@2009-03-19 10:43:33.404,1237464844877 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602073973@2009-03-1910:24:28.777,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902041536@2009-03-1911:43:15.675,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502069929@2009-03-1912:34:09.611,1237458221492: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13975523827@2009-03-1916:30:34.339,1237458918930: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13602073973@2009-03-1910:24:28.777,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13902041536@2009-03-1911:43:15.675,1237464933068: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13975523827@2009-03-19 16:30:34.339,1237458918930 open on
10.24.1.18:60020
2009-03-20 02:18:31,613 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13975523827@2009-03-19 16:30:34.339,1237458918930 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:31,614 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13602073973@2009-03-19 10:24:28.777,1237465040643 open on
10.24.1.18:60020
2009-03-20 02:18:31,614 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13602073973@2009-03-19 10:24:28.777,1237465040643 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:31,615 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13902041536@2009-03-19 11:43:15.675,1237464933068 open on
10.24.1.18:60020
2009-03-20 02:18:31,615 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13902041536@2009-03-19 11:43:15.675,1237464933068 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:34,619 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:34,619 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:34,620 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:34,620 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13502069929@2009-03-1912:34:09.611,1237458221492: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:34,620 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13502069929@2009-03-19 12:34:09.611,1237458221492 open on
10.24.1.18:60020
2009-03-20 02:18:34,620 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13502069929@2009-03-19 12:34:09.611,1237458221492 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:34,620 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13575586650@2009-03-19 10:23:52.770,1237464888516 to
server 10.24.1.18:60020
2009-03-20 02:18:34,621 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13576356781@2009-03-19 12:25:38.416,1237464513437 to
server 10.24.1.18:60020
2009-03-20 02:18:34,622 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13676396003@2009-03-19 11:46:08.485,1237464861537 to
server 10.24.1.18:60020
2009-03-20 02:18:34,622 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13702078070@2009-03-19 10:14:49.190,1237458450828 to
server 10.24.1.18:60020
2009-03-20 02:18:34,622 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13902076766@2009-03-19 12:37:19.721,1237458597186 to
server 10.24.1.18:60020
2009-03-20 02:18:34,623 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13602085653@2009-03-19 15:52:31.252,1237465040643 to
server 10.24.1.18:60020
2009-03-20 02:18:34,623 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13675579207@2009-03-19 16:29:27.989,1237465117541 to
server 10.24.1.18:60020
2009-03-20 02:18:34,623 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13502046687@2009-03-19 11:34:02.302,1237458126694 to
server 10.24.1.18:60020
2009-03-20 02:18:34,624 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13576310002@2009-03-19 11:48:25.420,1237458443847 to
server 10.24.1.18:60020
2009-03-20 02:18:34,624 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13675590841@2009-03-19 11:39:32.937,1237458415135 to
server 10.24.1.18:60020
2009-03-20 02:18:40,393 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,393 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13575586650@2009-03-1910:23:52.770,1237464888516: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576356781@2009-03-1912:25:38.416,1237464513437: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13676396003@2009-03-1911:46:08.485,1237464861537: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13702078070@2009-03-1910:14:49.190,1237458450828: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13776389501@2009-03-1910:44:13.310,1237459140493: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13602027108@2009-03-1910:36:21.453,1237458164945: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13776389501@2009-03-19 10:44:13.310,1237459140493 open on
10.24.1.18:60020
2009-03-20 02:18:40,394 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13776389501@2009-03-19 10:44:13.310,1237459140493 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:40,395 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13602027108@2009-03-19 10:36:21.453,1237458164945 open on
10.24.1.18:60020
2009-03-20 02:18:40,395 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13602027108@2009-03-19 10:36:21.453,1237458164945 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:44,433 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13502093230@2009-03-1910:26:01.389,1237465300311: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,433 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576356781@2009-03-1912:25:38.416,1237464513437: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13676396003@2009-03-1911:46:08.485,1237464861537: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13702078070@2009-03-1910:14:49.190,1237458450828: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13502093230@2009-03-19 10:26:01.389,1237465300311 open on
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13502093230@2009-03-19 10:26:01.389,1237465300311 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,434 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13575586650@2009-03-1910:23:52.770,1237464888516: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:44,435 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13575586650@2009-03-19 10:23:52.770,1237464888516 open on
10.24.1.18:60020
2009-03-20 02:18:44,435 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13575586650@2009-03-19 10:23:52.770,1237464888516 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13576356781@2009-03-1912:25:38.416,1237464513437: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13702078070@2009-03-1910:14:49.190,1237458450828: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13576356781@2009-03-19 12:25:38.416,1237464513437 open on
10.24.1.18:60020
2009-03-20 02:18:47,923 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,924 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13576356781@2009-03-19 12:25:38.416,1237464513437 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:47,924 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,924 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,924 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13676396003@2009-03-1911:46:08.485,1237464861537: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:47,925 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13676396003@2009-03-19 11:46:08.485,1237464861537 open on
10.24.1.18:60020
2009-03-20 02:18:47,925 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13676396003@2009-03-19 11:46:08.485,1237464861537 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,955 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13702078070@2009-03-1910:14:49.190,1237458450828: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,956 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13902076766@2009-03-1912:37:19.721,1237458597186: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:50,956 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13702078070@2009-03-19 10:14:49.190,1237458450828 open on
10.24.1.18:60020
2009-03-20 02:18:50,956 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13702078070@2009-03-19 10:14:49.190,1237458450828 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:50,957 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13902076766@2009-03-19 12:37:19.721,1237458597186 open on
10.24.1.18:60020
2009-03-20 02:18:50,957 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13902076766@2009-03-19 12:37:19.721,1237458597186 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:53,963 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13602085653@2009-03-1915:52:31.252,1237465040643: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13602085653@2009-03-19 15:52:31.252,1237465040643 open on
10.24.1.18:60020
2009-03-20 02:18:53,963 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13602085653@2009-03-19 15:52:31.252,1237465040643 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:53,964 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13775595768@2009-03-19 11:21:53.890,1237458914764 to
server 10.24.1.12:60020
2009-03-20 02:18:53,965 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13676337519@2009-03-19 11:31:55.904,1237457885852 to
server 10.24.1.12:60020
2009-03-20 02:18:53,965 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13902088543@2009-03-19 11:19:02.098,1237458838973 to
server 10.24.1.12:60020
2009-03-20 02:18:53,966 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13776342680@2009-03-19 10:24:56.416,1237458469261 to
server 10.24.1.12:60020
2009-03-20 02:18:56,969 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:56,969 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:56,970 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:56,970 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13675579207@2009-03-1916:29:27.989,1237465117541: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:56,970 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13675579207@2009-03-19 16:29:27.989,1237465117541 open on
10.24.1.18:60020
2009-03-20 02:18:56,970 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13675579207@2009-03-19 16:29:27.989,1237465117541 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:56,978 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13676337519@2009-03-1911:31:55.904,1237457885852: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13902088543@2009-03-1911:19:02.098,1237458838973: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776342680@2009-03-1910:24:56.416,1237458469261: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13775595768@2009-03-1911:21:53.890,1237458914764: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13676337519@2009-03-1911:31:55.904,1237457885852: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13775595768@2009-03-19 11:21:53.890,1237458914764 open on
10.24.1.12:60020
2009-03-20 02:18:56,978 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13775595768@2009-03-19 11:21:53.890,1237458914764 in region .META.,,1
with startcode 1237477067256 and server 10.24.1.12:60020
2009-03-20 02:18:56,979 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13676337519@2009-03-19 11:31:55.904,1237457885852 open on
10.24.1.12:60020
2009-03-20 02:18:56,979 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13676337519@2009-03-19 11:31:55.904,1237457885852 in region .META.,,1
with startcode 1237477067256 and server 10.24.1.12:60020
2009-03-20 02:18:59,980 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:59,980 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:59,980 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13502046687@2009-03-1911:34:02.302,1237458126694: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:59,980 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13576310002@2009-03-1911:48:25.420,1237458443847: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:18:59,980 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13502046687@2009-03-19 11:34:02.302,1237458126694 open on
10.24.1.18:60020
2009-03-20 02:18:59,980 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13502046687@2009-03-19 11:34:02.302,1237458126694 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:59,982 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13576310002@2009-03-19 11:48:25.420,1237458443847 open on
10.24.1.18:60020
2009-03-20 02:18:59,982 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13576310002@2009-03-19 11:48:25.420,1237458443847 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:18:59,989 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13776342680@2009-03-1910:24:56.416,1237458469261: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:59,989 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13902088543@2009-03-1911:19:02.098,1237458838973: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:59,989 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13776342680@2009-03-1910:24:56.416,1237458469261: safeMode=false
from
10.24.1.12:60020
2009-03-20 02:18:59,989 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13902088543@2009-03-19 11:19:02.098,1237458838973 open on
10.24.1.12:60020
2009-03-20 02:18:59,989 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13902088543@2009-03-19 11:19:02.098,1237458838973 in region .META.,,1
with startcode 1237477067256 and server 10.24.1.12:60020
2009-03-20 02:18:59,990 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13776342680@2009-03-19 10:24:56.416,1237458469261 open on
10.24.1.12:60020
2009-03-20 02:18:59,990 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13776342680@2009-03-19 10:24:56.416,1237458469261 in region .META.,,1
with startcode 1237477067256 and server 10.24.1.12:60020
2009-03-20 02:19:02,987 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13675590841@2009-03-1911:39:32.937,1237458415135: safeMode=false
from
10.24.1.18:60020
2009-03-20 02:19:02,987 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13675590841@2009-03-19 11:39:32.937,1237458415135 open on
10.24.1.18:60020
2009-03-20 02:19:02,987 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13675590841@2009-03-19 11:39:32.937,1237458415135 in region .META.,,1
with startcode 1237486635830 and server 10.24.1.18:60020
2009-03-20 02:19:10,033 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.20:60020}
2009-03-20 02:19:10,075 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: 10.24.1.14:60020}
2009-03-20 02:19:10,086 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {regionname:
-ROOT-,,0, startKey: <>, server: 10.24.1.14:60020} complete




On Thu, Mar 19, 2009 at 12:56 AM, schubert zhang <zs...@gmail.com> wrote:

> I have ran the HBase-0.19.1 and Hadoop-0.19.2 for about 7 hours, seems
> fine.But I am still experiencing RetriesExhaustedException.
> This issue cumbering my MapReduce job to catch the speed of incoming data.
>
> Schubert
>
> 09/03/19 00:43:59 INFO mapred.JobClient: Task Id :
> attempt_200903182025_0072_m_000005_2, Status : FAILED
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region TABA,113502069842@2009-03-1815:46:00.174,1237391402392, row '13502070621@2009-03-1823:15:20.348', but failed after 5 attempts.
> Exceptions:
>
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:960)
>         at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>         at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>         at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>         at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:128)
>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
> On Wed, Mar 18, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com> wrote:
>
>> Jean Daniel,
>> Now, I have running HBase0.19.1  candidate 2  and Hadoop 0.19.2 (The code
>> is got from hadoop branch-0.19, says the MapReduce bug has been fixed in
>> this branch.)
>>
>> I am verifying, and need some hours for examination.
>>
>> Schubert
>>
>>
>> On Wed, Mar 18, 2009 at 10:14 AM, schubert zhang <zs...@gmail.com>wrote:
>>
>>> Jean Daniel,
>>> 1. If I keep the retries at 10, the task may wait for a long time (142
>>> seconds). So I want to try to configure
>>>  hbase.client.pause = 2
>>> hbase.client.retries.number=5
>>>  Then the waiting time would be 14 seconds.
>>>
>>> 2. My DELL 2950, each node have 1 CPU with 4 cores. but there is one node
>>> (as a slave) have 2 CPU with 2*4=8 cores.
>>>
>>> 3. The issue of hadoop 0.19.1 MapReduce is:
>>>    (1) Some times, some task slots on some tasktracker are un-usable.
>>>         For example, my total MapTask Capacity is 20, and there are 40
>>> maps need to be process, but only 16 or 13 running tasks at most.
>>>         I think may be it is caused by taskracker black-list feature,
>>> this issue may occur after some failed tasks.
>>>    (2) I am currently checking the logs of mapreduce, since last night,
>>> my MapReduce job was hung up. It stop at 79.9% for about 13 hours.
>>>         The issue like this:
>>> https://issues.apache.org/jira/browse/HADOOP-5367
>>>         After a long time running of MapReduce (about 200 jobs have
>>> completed). The MapReduce job is huaguped forever.
>>>
>>>        The JobTracker repeatly logs:
>>>        2009-03-17 16:29:39,997 INFO org.apache.hadoop.mapred.JobTracker:
>>> Serious problem. While updating status, cannot find taskid
>>> attempt_200903171247_0387_m_000015_1
>>>
>>>        All TaskTrackers may hungup at the sametime, since the logs of
>>> each TaskTracer stop at that time.
>>>
>>>        And nefore the hangup. I can also find odds and ends such logs of
>>> JobTracker, such as.
>>>
>>>        2009-03-17 16:29:21,767 INFO org.apache.hadoop.mapred.JobTracker:
>>> Serious problem. While updating status, cannot find taskid
>>> attempt_200903171247_0387_m_000015_1
>>> Schubert
>>>
>>>
>>> On Wed, Mar 18, 2009 at 9:46 AM, Jean-Daniel Cryans <jdcryans@apache.org
>>> > wrote:
>>>
>>>> Schubert.
>>>>
>>>> Regards your previous email, I didn't notice something obvious. You
>>>> should keep the retries at 10 since you could easily have to wait more
>>>> than 10 seconds under heavy load.
>>>>
>>>> How many CPUs do you have on your Dell 2950?
>>>>
>>>> Regards your last email, HBase 0.19 isn't compatible with Hadoop 0.18.
>>>> Why do you say it's buggy?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Tue, Mar 17, 2009 at 9:33 PM, schubert zhang <zs...@gmail.com>
>>>> wrote:
>>>> > Hi Jean Daniel,
>>>> > I want to try the HBase0.19.1.
>>>> > And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a
>>>> stable
>>>> > hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?
>>>> >
>>>> > Schubert
>>>> >
>>>> > On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> Jean Daniel,
>>>> >> Thank you very much and I will re-setup my cluster with HBase 0.19.1
>>>> >> immediately.
>>>> >> And I will change my swappiness configuration immediately.
>>>> >>
>>>> >> For the RetriesExhaustedException exception, I double checked the
>>>> logs of
>>>> >> each node.
>>>> >>
>>>> >> for example one such exception occur at 09/03/17 16:29:10 for Region:
>>>> >> 13975557938@2009-03-17 13:42:36.412
>>>> >>
>>>> >> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
>>>> >> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
>>>> >> attempt_200903171247_0387_m_000009_0, Status : FAILED
>>>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>> contact
>>>> >> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007,
>>>> row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
>>>> >> Exceptions:
>>>> >>
>>>> >>         at
>>>> >>
>>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>>> >>         at
>>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>>> >>         at
>>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>>> >>         at
>>>> >>
>>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>>> >>         at
>>>> >>
>>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>>> >>         at
>>>> >>
>>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
>>>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>>> >>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>> >>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>>> >>
>>>> >> HMaster log:
>>>> >> 2009-03-17 16:29:16,501 INFO
>>>> org.apache.hadoop.hbase.master.ServerManager:
>>>> >> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>>> from
>>>> >> 10.24.1.20:60020
>>>> >> 2009-03-17 16:29:18,707 INFO
>>>> org.apache.hadoop.hbase.master.RegionManager:
>>>> >> assigning region TABA,13975557938@2009-03-1713:42:36.412,1237270434007 to
>>>> >> server 10.24.1.14:60020
>>>> >> 2009-03-17 16:29:19,795 INFO
>>>> org.apache.hadoop.hbase.master.BaseScanner:
>>>> >> RegionManager.metaScanner scanning meta region {regionname:
>>>> .META.,,1,
>>>> >> startKey: <>, server: 10.24.1.12:60020}
>>>> >> 2009-03-17 16:29:19,857 INFO
>>>> org.apache.hadoop.hbase.master.BaseScanner:
>>>> >> RegionManager.metaScanner scan of 34 row(s) of meta region
>>>> {regionname:
>>>> >> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
>>>> >> 2009-03-17 16:29:19,857 INFO
>>>> org.apache.hadoop.hbase.master.BaseScanner:
>>>> >> All 1 .META. region(s) scanned
>>>> >> 2009-03-17 16:29:21,723 INFO
>>>> org.apache.hadoop.hbase.master.ServerManager:
>>>> >> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>>> from
>>>> >> 10.24.1.14:60020
>>>> >> 2009-03-17 16:29:21,723 INFO
>>>> org.apache.hadoop.hbase.master.ServerManager:
>>>> >> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>>> from
>>>> >> 10.24.1.14:60020
>>>> >> 2009-03-17 16:29:21,723 INFO
>>>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
>>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
>>>> >> 10.24.1.14:60020
>>>> >> 2009-03-17 16:29:21,723 INFO
>>>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
>>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region
>>>> .META.,,1
>>>> >> with startcode 1237274685627 and server 10.24.1.14:60020
>>>> >>
>>>> >> one node:
>>>> >> 2009-03-17 16:28:43,617 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>>>> >> 2009-03-17 16:28:43,617 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>>> MSG_REGION_OPEN:
>>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>>>> >> 2009-03-17 16:28:44,604 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> region TABA,13975557938@2009-03-1713:42:36.412,1237270434007/168627948
>>>> >> available
>>>> >> 2009-03-17 16:29:01,682 INFO
>>>> org.apache.hadoop.hbase.regionserver.HLog:
>>>> >> Closed
>>>> >>
>>>> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
>>>> >> entries=100008. New log writer:
>>>> >> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
>>>> >> 2009-03-17 16:29:04,920 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558
>>>> in 28sec
>>>> >> 2009-03-17 16:29:04,920 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> starting  compaction on region TABA,13702099018@2009-03-1714
>>>> :00:04.791,1237270570714
>>>> >> 2009-03-17 16:29:04,927 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
>>>> in 0sec
>>>> >>
>>>> >> another node:
>>>> >> 2009-03-17 16:28:56,462 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> starting  compaction on region CDR,13975557938@2009-03-1713
>>>> :42:36.412,1237270434007
>>>> >> 2009-03-17 16:28:58,304 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>>>> >> 2009-03-17 16:28:58,305 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>>>> >> 2009-03-17 16:28:58,305 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>>> MSG_REGION_OPEN:
>>>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>>>> >> 2009-03-17 16:28:58,343 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> region CDR,13775569214@2009-03-1713:13:57.898,1237278535949/1626944070
>>>> >> available
>>>> >> 2009-03-17 16:28:58,343 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>>> MSG_REGION_OPEN:
>>>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got
>>>> >> brand-new decompressor
>>>> >> 2009-03-17 16:28:58,388 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> >> region CDR,13775545668@2009-03-1713:03:35.830,1237278535949/1656930599
>>>> >> available
>>>> >> 2009-03-17 16:29:01,325 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
>>>> >> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
>>>> >>
>>>> >> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <
>>>> jdcryans@apache.org>wrote:
>>>> >>
>>>> >>> Schubert,
>>>> >>>
>>>> >>> Yeah latency was always an issue in HBase since reliability was our
>>>> >>> focus. This is going to be fixed in 0.20 with a new file format and
>>>> >>> caching of hot cells.
>>>> >>>
>>>> >>> Regards your problem, please enable DEBUG and look in your
>>>> >>> regionserver logs to see what's happening. Usually, if you get a
>>>> >>> RetriesExhausted, it's because something is taking too long to get a
>>>> >>> region reassigned.
>>>> >>>
>>>> >>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which
>>>> has
>>>> >>> 2 fix for massive uploads like yours.
>>>> >>>
>>>> >>> I also see that there was some swap, please try setting the
>>>> swappiness
>>>> >>> to something like 20. It helped me a lot!
>>>> >>> http://www.sollers.ca/blog/2008/swappiness/
>>>> >>>
>>>> >>> J-D
>>>> >>>
>>>> >>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
>>>> >>> wrote:
>>>> >>> > This the "top" info of a regionserver/datanode/tasktracker node.We
>>>> can
>>>> >>> see
>>>> >>> > the HRegionServer node is very heavy-loaded.
>>>> >>> >
>>>> >>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
>>>> >>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0
>>>> zombie
>>>> >>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,
>>>>  1.2%si,
>>>> >>> >  0.0%st
>>>> >>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k
>>>> buffers
>>>> >>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k
>>>> cached
>>>> >>> >
>>>> >>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>  COMMAND
>>>> >>> >
>>>> >>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
>>>> >>> > (HRegionServer)
>>>> >>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
>>>> >>> (DataNode)
>>>> >>> >
>>>> >>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
>>>> >>> (MapReduce
>>>> >>> > Child)
>>>> >>> >
>>>> >>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <
>>>> zsongbo@gmail.com>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> >> Hi all,
>>>> >>> >>
>>>> >>> >> I am running a MapReduce Job to read files and insert rows into a
>>>> HBase
>>>> >>> table. It like a ETL procedure in database world.
>>>> >>> >>
>>>> >>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950
>>>> server
>>>> >>> with 4GB memory and 1TB disk on each node.
>>>> >>> >>
>>>> >>> >> Issue 1.
>>>> >>> >>
>>>> >>> >> Each time the MapReduce Job eat some files. And I want to get a
>>>> balance
>>>> >>> in this pipe-line (I have defined the HBase table with TTL) since
>>>> many new
>>>> >>> incoming files need to be eat. But sometimes many following
>>>> >>> RetriesExhaustedException happen and the job is blocked for a long
>>>> time. And
>>>> >>> the the MapReduce job will be behindhand and cannot catch up with
>>>> the speed
>>>> >>> of incoming files.
>>>> >>> >>
>>>> >>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
>>>> to
>>>> >>> contact region server Some server for region
>>>> TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row
>>>> '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
>>>> >>> >>
>>>> >>> >> Exceptions:
>>>> >>> >>
>>>> >>> >>       at
>>>> >>>
>>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>>> >>> >>       at
>>>> >>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>>> >>> >>       at
>>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>>> >>> >>       at
>>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>>> >>> >>       at
>>>> >>>
>>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>>> >>> >>       at
>>>> >>>
>>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>>> >>> >>       at
>>>> >>>
>>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>>> >>> >>       at
>>>> net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>>>> >>> >>       at
>>>> net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>>> >>> >>       at
>>>> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>> >>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>> >>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>>> >>> >>
>>>> >>> >> I check the code and find the retry policy when this exception is
>>>> like
>>>> >>> BSD TCP syn backoff policy:
>>>> >>> >>
>>>> >>> >>   /**
>>>> >>> >>    * This is a retry backoff multiplier table similar to the BSD
>>>> TCP
>>>> >>> syn
>>>> >>> >>    * backoff table, a bit more aggressive than simple exponential
>>>> >>> backoff.
>>>> >>> >>    */
>>>> >>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8,
>>>> 16, 32
>>>> >>> };
>>>> >>> >>
>>>> >>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
>>>> >>> following
>>>> >>> >> tow parameters can be changed and I reduce them.
>>>> >>> >> hbase.client.pause = 1 (second)
>>>> >>> >> hbase.client.retries.number=4
>>>> >>> >> Now, the block time should be 5 seconds.
>>>> >>> >>
>>>> >>> >> In fact, sometimes I have met such exception also when:
>>>> >>> >> getRegionServerWithRetries, and
>>>> >>> >> getRegionLocationForRowWithRetries, and
>>>> >>> >> processBatchOfRows
>>>> >>> >>
>>>> >>> >> I want to know what cause above exception happen. Region
>>>> spliting?
>>>> >>> >> Compaction? or Region moving to other regionserver?
>>>> >>> >> This exception happens very frequently in my cluster.
>>>> >>> >>
>>>> >>> >> Issue2: Also above workflow, I found the regionserver nodes are
>>>> very
>>>> >>> busy
>>>> >>> >> wigh very heavy load.
>>>> >>> >> The process of HRegionServer takes almost all of the used CPU
>>>> load.
>>>> >>> >> In my previous test round, I ran 4 maptask to insert data in each
>>>> node,
>>>> >>> now
>>>> >>> >> I just run 2 maptask in each node. I think the performance of
>>>> >>> HRegionServer
>>>> >>> >> is not so good to achieve some of my applications. Because except
>>>> for
>>>> >>> the
>>>> >>> >> insert/input data procedure, we will query rows from the same
>>>> table for
>>>> >>> >> online applications, which need low-latency. I my test
>>>> experience, I
>>>> >>> find
>>>> >>> >> the online query (scan about 50 rows with startRow and endRow)
>>>> latency
>>>> >>> is
>>>> >>> >> 2-6 seconds for the first query, and about 500ms for the
>>>> re-query. And
>>>> >>> when
>>>> >>> >> the cluster is busy for inserting data to table, the query
>>>> latency may
>>>> >>> as
>>>> >>> >> long as 10-30 seconds. And when the table is in
>>>> sliping/compaction,
>>>> >>> maybe
>>>> >>> >> longer.
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
I have ran the HBase-0.19.1 and Hadoop-0.19.2 for about 7 hours, seems fine.But
I am still experiencing RetriesExhaustedException.
This issue cumbering my MapReduce job to catch the speed of incoming data.

Schubert

09/03/19 00:43:59 INFO mapred.JobClient: Task Id :
attempt_200903182025_0072_m_000005_2, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region
TABA,113502069842@2009-03-1815:46:00.174,1237391402392, row
'13502070621@2009-03-1823:15:20.348', but failed after 5 attempts.
Exceptions:

        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:960)
        at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
        at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:128)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)

On Wed, Mar 18, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com> wrote:

> Jean Daniel,
> Now, I have running HBase0.19.1  candidate 2  and Hadoop 0.19.2 (The code
> is got from hadoop branch-0.19, says the MapReduce bug has been fixed in
> this branch.)
>
> I am verifying, and need some hours for examination.
>
> Schubert
>
>
> On Wed, Mar 18, 2009 at 10:14 AM, schubert zhang <zs...@gmail.com>wrote:
>
>> Jean Daniel,
>> 1. If I keep the retries at 10, the task may wait for a long time (142
>> seconds). So I want to try to configure
>>  hbase.client.pause = 2
>> hbase.client.retries.number=5
>>  Then the waiting time would be 14 seconds.
>>
>> 2. My DELL 2950, each node have 1 CPU with 4 cores. but there is one node
>> (as a slave) have 2 CPU with 2*4=8 cores.
>>
>> 3. The issue of hadoop 0.19.1 MapReduce is:
>>    (1) Some times, some task slots on some tasktracker are un-usable.
>>         For example, my total MapTask Capacity is 20, and there are 40
>> maps need to be process, but only 16 or 13 running tasks at most.
>>         I think may be it is caused by taskracker black-list feature, this
>> issue may occur after some failed tasks.
>>    (2) I am currently checking the logs of mapreduce, since last night, my
>> MapReduce job was hung up. It stop at 79.9% for about 13 hours.
>>         The issue like this:
>> https://issues.apache.org/jira/browse/HADOOP-5367
>>         After a long time running of MapReduce (about 200 jobs have
>> completed). The MapReduce job is huaguped forever.
>>
>>        The JobTracker repeatly logs:
>>        2009-03-17 16:29:39,997 INFO org.apache.hadoop.mapred.JobTracker:
>> Serious problem. While updating status, cannot find taskid
>> attempt_200903171247_0387_m_000015_1
>>
>>        All TaskTrackers may hungup at the sametime, since the logs of each
>> TaskTracer stop at that time.
>>
>>        And nefore the hangup. I can also find odds and ends such logs of
>> JobTracker, such as.
>>
>>        2009-03-17 16:29:21,767 INFO org.apache.hadoop.mapred.JobTracker:
>> Serious problem. While updating status, cannot find taskid
>> attempt_200903171247_0387_m_000015_1
>> Schubert
>>
>>
>> On Wed, Mar 18, 2009 at 9:46 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> Schubert.
>>>
>>> Regards your previous email, I didn't notice something obvious. You
>>> should keep the retries at 10 since you could easily have to wait more
>>> than 10 seconds under heavy load.
>>>
>>> How many CPUs do you have on your Dell 2950?
>>>
>>> Regards your last email, HBase 0.19 isn't compatible with Hadoop 0.18.
>>> Why do you say it's buggy?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, Mar 17, 2009 at 9:33 PM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>> > Hi Jean Daniel,
>>> > I want to try the HBase0.19.1.
>>> > And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a
>>> stable
>>> > hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?
>>> >
>>> > Schubert
>>> >
>>> > On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>> >
>>> >> Jean Daniel,
>>> >> Thank you very much and I will re-setup my cluster with HBase 0.19.1
>>> >> immediately.
>>> >> And I will change my swappiness configuration immediately.
>>> >>
>>> >> For the RetriesExhaustedException exception, I double checked the logs
>>> of
>>> >> each node.
>>> >>
>>> >> for example one such exception occur at 09/03/17 16:29:10 for Region:
>>> >> 13975557938@2009-03-17 13:42:36.412
>>> >>
>>> >> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
>>> >> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
>>> >> attempt_200903171247_0387_m_000009_0, Status : FAILED
>>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> contact
>>> >> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007,
>>> row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
>>> >> Exceptions:
>>> >>
>>> >>         at
>>> >>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>> >>         at
>>> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>> >>         at
>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>> >>         at
>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>> >>         at
>>> >>
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>> >>         at
>>> >>
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>> >>         at
>>> >>
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
>>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>> >>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>> >>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>> >>
>>> >> HMaster log:
>>> >> 2009-03-17 16:29:16,501 INFO
>>> org.apache.hadoop.hbase.master.ServerManager:
>>> >> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>> from
>>> >> 10.24.1.20:60020
>>> >> 2009-03-17 16:29:18,707 INFO
>>> org.apache.hadoop.hbase.master.RegionManager:
>>> >> assigning region TABA,13975557938@2009-03-1713:42:36.412,1237270434007 to
>>> >> server 10.24.1.14:60020
>>> >> 2009-03-17 16:29:19,795 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner:
>>> >> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
>>> >> startKey: <>, server: 10.24.1.12:60020}
>>> >> 2009-03-17 16:29:19,857 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner:
>>> >> RegionManager.metaScanner scan of 34 row(s) of meta region
>>> {regionname:
>>> >> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
>>> >> 2009-03-17 16:29:19,857 INFO
>>> org.apache.hadoop.hbase.master.BaseScanner:
>>> >> All 1 .META. region(s) scanned
>>> >> 2009-03-17 16:29:21,723 INFO
>>> org.apache.hadoop.hbase.master.ServerManager:
>>> >> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>> from
>>> >> 10.24.1.14:60020
>>> >> 2009-03-17 16:29:21,723 INFO
>>> org.apache.hadoop.hbase.master.ServerManager:
>>> >> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>>> from
>>> >> 10.24.1.14:60020
>>> >> 2009-03-17 16:29:21,723 INFO
>>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
>>> >> 10.24.1.14:60020
>>> >> 2009-03-17 16:29:21,723 INFO
>>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region
>>> .META.,,1
>>> >> with startcode 1237274685627 and server 10.24.1.14:60020
>>> >>
>>> >> one node:
>>> >> 2009-03-17 16:28:43,617 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>>> >> 2009-03-17 16:28:43,617 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>> MSG_REGION_OPEN:
>>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>>> >> 2009-03-17 16:28:44,604 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> region TABA,13975557938@2009-03-1713:42:36.412,1237270434007/168627948
>>> >> available
>>> >> 2009-03-17 16:29:01,682 INFO
>>> org.apache.hadoop.hbase.regionserver.HLog:
>>> >> Closed
>>> >>
>>> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
>>> >> entries=100008. New log writer:
>>> >> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
>>> >> 2009-03-17 16:29:04,920 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558
>>> in 28sec
>>> >> 2009-03-17 16:29:04,920 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> starting  compaction on region TABA,13702099018@2009-03-1714
>>> :00:04.791,1237270570714
>>> >> 2009-03-17 16:29:04,927 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
>>> in 0sec
>>> >>
>>> >> another node:
>>> >> 2009-03-17 16:28:56,462 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> starting  compaction on region CDR,13975557938@2009-03-1713
>>> :42:36.412,1237270434007
>>> >> 2009-03-17 16:28:58,304 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>>> >> 2009-03-17 16:28:58,305 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>>> >> 2009-03-17 16:28:58,305 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>> MSG_REGION_OPEN:
>>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>>> >> 2009-03-17 16:28:58,343 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> region CDR,13775569214@2009-03-1713:13:57.898,1237278535949/1626944070
>>> >> available
>>> >> 2009-03-17 16:28:58,343 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>> MSG_REGION_OPEN:
>>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> >> brand-new decompressor
>>> >> 2009-03-17 16:28:58,388 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> >> region CDR,13775545668@2009-03-1713:03:35.830,1237278535949/1656930599
>>> >> available
>>> >> 2009-03-17 16:29:01,325 INFO
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
>>> >> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
>>> >>
>>> >> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>wrote:
>>> >>
>>> >>> Schubert,
>>> >>>
>>> >>> Yeah latency was always an issue in HBase since reliability was our
>>> >>> focus. This is going to be fixed in 0.20 with a new file format and
>>> >>> caching of hot cells.
>>> >>>
>>> >>> Regards your problem, please enable DEBUG and look in your
>>> >>> regionserver logs to see what's happening. Usually, if you get a
>>> >>> RetriesExhausted, it's because something is taking too long to get a
>>> >>> region reassigned.
>>> >>>
>>> >>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
>>> >>> 2 fix for massive uploads like yours.
>>> >>>
>>> >>> I also see that there was some swap, please try setting the
>>> swappiness
>>> >>> to something like 20. It helped me a lot!
>>> >>> http://www.sollers.ca/blog/2008/swappiness/
>>> >>>
>>> >>> J-D
>>> >>>
>>> >>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
>>> >>> wrote:
>>> >>> > This the "top" info of a regionserver/datanode/tasktracker node.We
>>> can
>>> >>> see
>>> >>> > the HRegionServer node is very heavy-loaded.
>>> >>> >
>>> >>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
>>> >>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0
>>> zombie
>>> >>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,
>>>  1.2%si,
>>> >>> >  0.0%st
>>> >>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k
>>> buffers
>>> >>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k
>>> cached
>>> >>> >
>>> >>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> >>> >
>>> >>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
>>> >>> > (HRegionServer)
>>> >>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
>>> >>> (DataNode)
>>> >>> >
>>> >>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
>>> >>> (MapReduce
>>> >>> > Child)
>>> >>> >
>>> >>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zsongbo@gmail.com
>>> >
>>> >>> wrote:
>>> >>> >
>>> >>> >> Hi all,
>>> >>> >>
>>> >>> >> I am running a MapReduce Job to read files and insert rows into a
>>> HBase
>>> >>> table. It like a ETL procedure in database world.
>>> >>> >>
>>> >>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950
>>> server
>>> >>> with 4GB memory and 1TB disk on each node.
>>> >>> >>
>>> >>> >> Issue 1.
>>> >>> >>
>>> >>> >> Each time the MapReduce Job eat some files. And I want to get a
>>> balance
>>> >>> in this pipe-line (I have defined the HBase table with TTL) since
>>> many new
>>> >>> incoming files need to be eat. But sometimes many following
>>> >>> RetriesExhaustedException happen and the job is blocked for a long
>>> time. And
>>> >>> the the MapReduce job will be behindhand and cannot catch up with the
>>> speed
>>> >>> of incoming files.
>>> >>> >>
>>> >>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
>>> to
>>> >>> contact region server Some server for region
>>> TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row
>>> '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
>>> >>> >>
>>> >>> >> Exceptions:
>>> >>> >>
>>> >>> >>       at
>>> >>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>> >>> >>       at
>>> >>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>> >>> >>       at
>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>> >>> >>       at
>>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>> >>> >>       at
>>> >>>
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>> >>> >>       at
>>> >>>
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>> >>> >>       at
>>> >>>
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>> >>> >>       at
>>> net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>>> >>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>> >>> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>> >>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>> >>> >>
>>> >>> >> I check the code and find the retry policy when this exception is
>>> like
>>> >>> BSD TCP syn backoff policy:
>>> >>> >>
>>> >>> >>   /**
>>> >>> >>    * This is a retry backoff multiplier table similar to the BSD
>>> TCP
>>> >>> syn
>>> >>> >>    * backoff table, a bit more aggressive than simple exponential
>>> >>> backoff.
>>> >>> >>    */
>>> >>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8,
>>> 16, 32
>>> >>> };
>>> >>> >>
>>> >>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
>>> >>> following
>>> >>> >> tow parameters can be changed and I reduce them.
>>> >>> >> hbase.client.pause = 1 (second)
>>> >>> >> hbase.client.retries.number=4
>>> >>> >> Now, the block time should be 5 seconds.
>>> >>> >>
>>> >>> >> In fact, sometimes I have met such exception also when:
>>> >>> >> getRegionServerWithRetries, and
>>> >>> >> getRegionLocationForRowWithRetries, and
>>> >>> >> processBatchOfRows
>>> >>> >>
>>> >>> >> I want to know what cause above exception happen. Region spliting?
>>> >>> >> Compaction? or Region moving to other regionserver?
>>> >>> >> This exception happens very frequently in my cluster.
>>> >>> >>
>>> >>> >> Issue2: Also above workflow, I found the regionserver nodes are
>>> very
>>> >>> busy
>>> >>> >> wigh very heavy load.
>>> >>> >> The process of HRegionServer takes almost all of the used CPU
>>> load.
>>> >>> >> In my previous test round, I ran 4 maptask to insert data in each
>>> node,
>>> >>> now
>>> >>> >> I just run 2 maptask in each node. I think the performance of
>>> >>> HRegionServer
>>> >>> >> is not so good to achieve some of my applications. Because except
>>> for
>>> >>> the
>>> >>> >> insert/input data procedure, we will query rows from the same
>>> table for
>>> >>> >> online applications, which need low-latency. I my test experience,
>>> I
>>> >>> find
>>> >>> >> the online query (scan about 50 rows with startRow and endRow)
>>> latency
>>> >>> is
>>> >>> >> 2-6 seconds for the first query, and about 500ms for the re-query.
>>> And
>>> >>> when
>>> >>> >> the cluster is busy for inserting data to table, the query latency
>>> may
>>> >>> as
>>> >>> >> long as 10-30 seconds. And when the table is in
>>> sliping/compaction,
>>> >>> maybe
>>> >>> >> longer.
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
Jean Daniel,
Now, I have running HBase0.19.1  candidate 2  and Hadoop 0.19.2 (The code is
got from hadoop branch-0.19, says the MapReduce bug has been fixed in this
branch.)

I am verifying, and need some hours for examination.

Schubert

On Wed, Mar 18, 2009 at 10:14 AM, schubert zhang <zs...@gmail.com> wrote:

> Jean Daniel,
> 1. If I keep the retries at 10, the task may wait for a long time (142
> seconds). So I want to try to configure
>  hbase.client.pause = 2
> hbase.client.retries.number=5
>  Then the waiting time would be 14 seconds.
>
> 2. My DELL 2950, each node have 1 CPU with 4 cores. but there is one node
> (as a slave) have 2 CPU with 2*4=8 cores.
>
> 3. The issue of hadoop 0.19.1 MapReduce is:
>    (1) Some times, some task slots on some tasktracker are un-usable.
>         For example, my total MapTask Capacity is 20, and there are 40
> maps need to be process, but only 16 or 13 running tasks at most.
>         I think may be it is caused by taskracker black-list feature, this
> issue may occur after some failed tasks.
>    (2) I am currently checking the logs of mapreduce, since last night, my
> MapReduce job was hung up. It stop at 79.9% for about 13 hours.
>         The issue like this:
> https://issues.apache.org/jira/browse/HADOOP-5367
>         After a long time running of MapReduce (about 200 jobs have
> completed). The MapReduce job is huaguped forever.
>
>        The JobTracker repeatly logs:
>        2009-03-17 16:29:39,997 INFO org.apache.hadoop.mapred.JobTracker:
> Serious problem. While updating status, cannot find taskid
> attempt_200903171247_0387_m_000015_1
>
>        All TaskTrackers may hungup at the sametime, since the logs of each
> TaskTracer stop at that time.
>
>        And nefore the hangup. I can also find odds and ends such logs of
> JobTracker, such as.
>
>        2009-03-17 16:29:21,767 INFO org.apache.hadoop.mapred.JobTracker:
> Serious problem. While updating status, cannot find taskid
> attempt_200903171247_0387_m_000015_1
> Schubert
>
>
> On Wed, Mar 18, 2009 at 9:46 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Schubert.
>>
>> Regards your previous email, I didn't notice something obvious. You
>> should keep the retries at 10 since you could easily have to wait more
>> than 10 seconds under heavy load.
>>
>> How many CPUs do you have on your Dell 2950?
>>
>> Regards your last email, HBase 0.19 isn't compatible with Hadoop 0.18.
>> Why do you say it's buggy?
>>
>> Thx,
>>
>> J-D
>>
>> On Tue, Mar 17, 2009 at 9:33 PM, schubert zhang <zs...@gmail.com>
>> wrote:
>> > Hi Jean Daniel,
>> > I want to try the HBase0.19.1.
>> > And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a
>> stable
>> > hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?
>> >
>> > Schubert
>> >
>> > On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com>
>> wrote:
>> >
>> >> Jean Daniel,
>> >> Thank you very much and I will re-setup my cluster with HBase 0.19.1
>> >> immediately.
>> >> And I will change my swappiness configuration immediately.
>> >>
>> >> For the RetriesExhaustedException exception, I double checked the logs
>> of
>> >> each node.
>> >>
>> >> for example one such exception occur at 09/03/17 16:29:10 for Region:
>> >> 13975557938@2009-03-17 13:42:36.412
>> >>
>> >> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
>> >> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
>> >> attempt_200903171247_0387_m_000009_0, Status : FAILED
>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> >> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007,
>> row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
>> >> Exceptions:
>> >>
>> >>         at
>> >>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>> >>         at
>> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>> >>         at
>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>> >>         at
>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>> >>         at
>> >>
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>> >>         at
>> >>
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>> >>         at
>> >>
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
>> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>> >>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>> >>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>> >>
>> >> HMaster log:
>> >> 2009-03-17 16:29:16,501 INFO
>> org.apache.hadoop.hbase.master.ServerManager:
>> >> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>> from
>> >> 10.24.1.20:60020
>> >> 2009-03-17 16:29:18,707 INFO
>> org.apache.hadoop.hbase.master.RegionManager:
>> >> assigning region TABA,13975557938@2009-03-1713:42:36.412,1237270434007 to
>> >> server 10.24.1.14:60020
>> >> 2009-03-17 16:29:19,795 INFO
>> org.apache.hadoop.hbase.master.BaseScanner:
>> >> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
>> >> startKey: <>, server: 10.24.1.12:60020}
>> >> 2009-03-17 16:29:19,857 INFO
>> org.apache.hadoop.hbase.master.BaseScanner:
>> >> RegionManager.metaScanner scan of 34 row(s) of meta region {regionname:
>> >> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
>> >> 2009-03-17 16:29:19,857 INFO
>> org.apache.hadoop.hbase.master.BaseScanner:
>> >> All 1 .META. region(s) scanned
>> >> 2009-03-17 16:29:21,723 INFO
>> org.apache.hadoop.hbase.master.ServerManager:
>> >> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>> from
>> >> 10.24.1.14:60020
>> >> 2009-03-17 16:29:21,723 INFO
>> org.apache.hadoop.hbase.master.ServerManager:
>> >> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
>> from
>> >> 10.24.1.14:60020
>> >> 2009-03-17 16:29:21,723 INFO
>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
>> >> 10.24.1.14:60020
>> >> 2009-03-17 16:29:21,723 INFO
>> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region
>> .META.,,1
>> >> with startcode 1237274685627 and server 10.24.1.14:60020
>> >>
>> >> one node:
>> >> 2009-03-17 16:28:43,617 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>> >> 2009-03-17 16:28:43,617 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>> >> 2009-03-17 16:28:44,604 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> region TABA,13975557938@2009-03-1713:42:36.412,1237270434007/168627948
>> >> available
>> >> 2009-03-17 16:29:01,682 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> >> Closed
>> >>
>> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
>> >> entries=100008. New log writer:
>> >> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
>> >> 2009-03-17 16:29:04,920 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558
>> in 28sec
>> >> 2009-03-17 16:29:04,920 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> starting  compaction on region TABA,13702099018@2009-03-1714
>> :00:04.791,1237270570714
>> >> 2009-03-17 16:29:04,927 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
>> in 0sec
>> >>
>> >> another node:
>> >> 2009-03-17 16:28:56,462 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> starting  compaction on region CDR,13975557938@2009-03-1713
>> :42:36.412,1237270434007
>> >> 2009-03-17 16:28:58,304 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>> >> 2009-03-17 16:28:58,305 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>> >> 2009-03-17 16:28:58,305 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>> >> 2009-03-17 16:28:58,343 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> region CDR,13775569214@2009-03-1713:13:57.898,1237278535949/1626944070
>> >> available
>> >> 2009-03-17 16:28:58,343 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> >> brand-new decompressor
>> >> 2009-03-17 16:28:58,388 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> region CDR,13775545668@2009-03-1713:03:35.830,1237278535949/1656930599
>> >> available
>> >> 2009-03-17 16:29:01,325 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
>> >> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
>> >>
>> >> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >>
>> >>> Schubert,
>> >>>
>> >>> Yeah latency was always an issue in HBase since reliability was our
>> >>> focus. This is going to be fixed in 0.20 with a new file format and
>> >>> caching of hot cells.
>> >>>
>> >>> Regards your problem, please enable DEBUG and look in your
>> >>> regionserver logs to see what's happening. Usually, if you get a
>> >>> RetriesExhausted, it's because something is taking too long to get a
>> >>> region reassigned.
>> >>>
>> >>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
>> >>> 2 fix for massive uploads like yours.
>> >>>
>> >>> I also see that there was some swap, please try setting the swappiness
>> >>> to something like 20. It helped me a lot!
>> >>> http://www.sollers.ca/blog/2008/swappiness/
>> >>>
>> >>> J-D
>> >>>
>> >>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
>> >>> wrote:
>> >>> > This the "top" info of a regionserver/datanode/tasktracker node.We
>> can
>> >>> see
>> >>> > the HRegionServer node is very heavy-loaded.
>> >>> >
>> >>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
>> >>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
>> >>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,
>>  1.2%si,
>> >>> >  0.0%st
>> >>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k
>> buffers
>> >>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k
>> cached
>> >>> >
>> >>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> >>> >
>> >>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
>> >>> > (HRegionServer)
>> >>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
>> >>> (DataNode)
>> >>> >
>> >>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
>> >>> (MapReduce
>> >>> > Child)
>> >>> >
>> >>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hi all,
>> >>> >>
>> >>> >> I am running a MapReduce Job to read files and insert rows into a
>> HBase
>> >>> table. It like a ETL procedure in database world.
>> >>> >>
>> >>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950
>> server
>> >>> with 4GB memory and 1TB disk on each node.
>> >>> >>
>> >>> >> Issue 1.
>> >>> >>
>> >>> >> Each time the MapReduce Job eat some files. And I want to get a
>> balance
>> >>> in this pipe-line (I have defined the HBase table with TTL) since many
>> new
>> >>> incoming files need to be eat. But sometimes many following
>> >>> RetriesExhaustedException happen and the job is blocked for a long
>> time. And
>> >>> the the MapReduce job will be behindhand and cannot catch up with the
>> speed
>> >>> of incoming files.
>> >>> >>
>> >>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> >>> contact region server Some server for region
>> TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row
>> '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
>> >>> >>
>> >>> >> Exceptions:
>> >>> >>
>> >>> >>       at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>> >>> >>       at
>> >>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>> >>> >>       at
>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>> >>> >>       at
>> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>> >>> >>       at
>> >>>
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>> >>> >>       at
>> >>>
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>> >>> >>       at
>> >>>
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>> >>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>> >>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>> >>> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>> >>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>> >>> >>
>> >>> >> I check the code and find the retry policy when this exception is
>> like
>> >>> BSD TCP syn backoff policy:
>> >>> >>
>> >>> >>   /**
>> >>> >>    * This is a retry backoff multiplier table similar to the BSD
>> TCP
>> >>> syn
>> >>> >>    * backoff table, a bit more aggressive than simple exponential
>> >>> backoff.
>> >>> >>    */
>> >>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16,
>> 32
>> >>> };
>> >>> >>
>> >>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
>> >>> following
>> >>> >> tow parameters can be changed and I reduce them.
>> >>> >> hbase.client.pause = 1 (second)
>> >>> >> hbase.client.retries.number=4
>> >>> >> Now, the block time should be 5 seconds.
>> >>> >>
>> >>> >> In fact, sometimes I have met such exception also when:
>> >>> >> getRegionServerWithRetries, and
>> >>> >> getRegionLocationForRowWithRetries, and
>> >>> >> processBatchOfRows
>> >>> >>
>> >>> >> I want to know what cause above exception happen. Region spliting?
>> >>> >> Compaction? or Region moving to other regionserver?
>> >>> >> This exception happens very frequently in my cluster.
>> >>> >>
>> >>> >> Issue2: Also above workflow, I found the regionserver nodes are
>> very
>> >>> busy
>> >>> >> wigh very heavy load.
>> >>> >> The process of HRegionServer takes almost all of the used CPU load.
>> >>> >> In my previous test round, I ran 4 maptask to insert data in each
>> node,
>> >>> now
>> >>> >> I just run 2 maptask in each node. I think the performance of
>> >>> HRegionServer
>> >>> >> is not so good to achieve some of my applications. Because except
>> for
>> >>> the
>> >>> >> insert/input data procedure, we will query rows from the same table
>> for
>> >>> >> online applications, which need low-latency. I my test experience,
>> I
>> >>> find
>> >>> >> the online query (scan about 50 rows with startRow and endRow)
>> latency
>> >>> is
>> >>> >> 2-6 seconds for the first query, and about 500ms for the re-query.
>> And
>> >>> when
>> >>> >> the cluster is busy for inserting data to table, the query latency
>> may
>> >>> as
>> >>> >> long as 10-30 seconds. And when the table is in sliping/compaction,
>> >>> maybe
>> >>> >> longer.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
Jean Daniel,
1. If I keep the retries at 10, the task may wait for a long time (142
seconds). So I want to try to configure
hbase.client.pause = 2
hbase.client.retries.number=5
Then the waiting time would be 14 seconds.

2. My DELL 2950, each node have 1 CPU with 4 cores. but there is one node
(as a slave) have 2 CPU with 2*4=8 cores.

3. The issue of hadoop 0.19.1 MapReduce is:
   (1) Some times, some task slots on some tasktracker are un-usable.
        For example, my total MapTask Capacity is 20, and there are 40 maps
need to be process, but only 16 or 13 running tasks at most.
        I think may be it is caused by taskracker black-list feature, this
issue may occur after some failed tasks.
   (2) I am currently checking the logs of mapreduce, since last night, my
MapReduce job was hung up. It stop at 79.9% for about 13 hours.
        The issue like this:
https://issues.apache.org/jira/browse/HADOOP-5367
        After a long time running of MapReduce (about 200 jobs have
completed). The MapReduce job is huaguped forever.

       The JobTracker repeatly logs:
       2009-03-17 16:29:39,997 INFO org.apache.hadoop.mapred.JobTracker:
Serious problem. While updating status, cannot find taskid
attempt_200903171247_0387_m_000015_1

       All TaskTrackers may hungup at the sametime, since the logs of each
TaskTracer stop at that time.

       And nefore the hangup. I can also find odds and ends such logs of
JobTracker, such as.

       2009-03-17 16:29:21,767 INFO org.apache.hadoop.mapred.JobTracker:
Serious problem. While updating status, cannot find taskid
attempt_200903171247_0387_m_000015_1
Schubert

On Wed, Mar 18, 2009 at 9:46 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Schubert.
>
> Regards your previous email, I didn't notice something obvious. You
> should keep the retries at 10 since you could easily have to wait more
> than 10 seconds under heavy load.
>
> How many CPUs do you have on your Dell 2950?
>
> Regards your last email, HBase 0.19 isn't compatible with Hadoop 0.18.
> Why do you say it's buggy?
>
> Thx,
>
> J-D
>
> On Tue, Mar 17, 2009 at 9:33 PM, schubert zhang <zs...@gmail.com> wrote:
> > Hi Jean Daniel,
> > I want to try the HBase0.19.1.
> > And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a stable
> > hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?
> >
> > Schubert
> >
> > On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com>
> wrote:
> >
> >> Jean Daniel,
> >> Thank you very much and I will re-setup my cluster with HBase 0.19.1
> >> immediately.
> >> And I will change my swappiness configuration immediately.
> >>
> >> For the RetriesExhaustedException exception, I double checked the logs
> of
> >> each node.
> >>
> >> for example one such exception occur at 09/03/17 16:29:10 for Region:
> >> 13975557938@2009-03-17 13:42:36.412
> >>
> >> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
> >> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
> >> attempt_200903171247_0387_m_000009_0, Status : FAILED
> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> >> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007,
> row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
> >> Exceptions:
> >>
> >>         at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> >>         at
> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >>         at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> >>         at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> >>         at
> >>
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
> >>         at
> >>
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
> >>         at
> >>
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
> >>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
> >>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> >>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >>
> >> HMaster log:
> >> 2009-03-17 16:29:16,501 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> >> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
> from
> >> 10.24.1.20:60020
> >> 2009-03-17 16:29:18,707 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> >> assigning region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
> to
> >> server 10.24.1.14:60020
> >> 2009-03-17 16:29:19,795 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
> >> startKey: <>, server: 10.24.1.12:60020}
> >> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.metaScanner scan of 34 row(s) of meta region {regionname:
> >> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
> >> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> All 1 .META. region(s) scanned
> >> 2009-03-17 16:29:21,723 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> >> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
> from
> >> 10.24.1.14:60020
> >> 2009-03-17 16:29:21,723 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> >> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007
> from
> >> 10.24.1.14:60020
> >> 2009-03-17 16:29:21,723 INFO
> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
> >> 10.24.1.14:60020
> >> 2009-03-17 16:29:21,723 INFO
> >> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region
> .META.,,1
> >> with startcode 1237274685627 and server 10.24.1.14:60020
> >>
> >> one node:
> >> 2009-03-17 16:28:43,617 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
> >> 2009-03-17 16:28:43,617 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> >> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
> >> 2009-03-17 16:28:44,604 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007/168627948
> >> available
> >> 2009-03-17 16:29:01,682 INFO org.apache.hadoop.hbase.regionserver.HLog:
> >> Closed
> >>
> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
> >> entries=100008. New log writer:
> >> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
> >> 2009-03-17 16:29:04,920 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558
> in 28sec
> >> 2009-03-17 16:29:04,920 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> starting  compaction on region TABA,13702099018@2009-03-1714
> :00:04.791,1237270570714
> >> 2009-03-17 16:29:04,927 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
> in 0sec
> >>
> >> another node:
> >> 2009-03-17 16:28:56,462 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> starting  compaction on region CDR,13975557938@2009-03-1713
> :42:36.412,1237270434007
> >> 2009-03-17 16:28:58,304 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
> >> 2009-03-17 16:28:58,305 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
> >> 2009-03-17 16:28:58,305 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> >> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
> >> 2009-03-17 16:28:58,343 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> region CDR,13775569214@2009-03-17 13:13:57.898,1237278535949/1626944070
> >> available
> >> 2009-03-17 16:28:58,343 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> >> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >> brand-new decompressor
> >> 2009-03-17 16:28:58,388 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> >> region CDR,13775545668@2009-03-17 13:03:35.830,1237278535949/1656930599
> >> available
> >> 2009-03-17 16:29:01,325 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> >> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
> >>
> >> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >>
> >>> Schubert,
> >>>
> >>> Yeah latency was always an issue in HBase since reliability was our
> >>> focus. This is going to be fixed in 0.20 with a new file format and
> >>> caching of hot cells.
> >>>
> >>> Regards your problem, please enable DEBUG and look in your
> >>> regionserver logs to see what's happening. Usually, if you get a
> >>> RetriesExhausted, it's because something is taking too long to get a
> >>> region reassigned.
> >>>
> >>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
> >>> 2 fix for massive uploads like yours.
> >>>
> >>> I also see that there was some swap, please try setting the swappiness
> >>> to something like 20. It helped me a lot!
> >>> http://www.sollers.ca/blog/2008/swappiness/
> >>>
> >>> J-D
> >>>
> >>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
> >>> wrote:
> >>> > This the "top" info of a regionserver/datanode/tasktracker node.We
> can
> >>> see
> >>> > the HRegionServer node is very heavy-loaded.
> >>> >
> >>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
> >>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
> >>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,
>  1.2%si,
> >>> >  0.0%st
> >>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k
> buffers
> >>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k
> cached
> >>> >
> >>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >>> >
> >>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
> >>> > (HRegionServer)
> >>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
> >>> (DataNode)
> >>> >
> >>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
> >>> (MapReduce
> >>> > Child)
> >>> >
> >>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> Hi all,
> >>> >>
> >>> >> I am running a MapReduce Job to read files and insert rows into a
> HBase
> >>> table. It like a ETL procedure in database world.
> >>> >>
> >>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server
> >>> with 4GB memory and 1TB disk on each node.
> >>> >>
> >>> >> Issue 1.
> >>> >>
> >>> >> Each time the MapReduce Job eat some files. And I want to get a
> balance
> >>> in this pipe-line (I have defined the HBase table with TTL) since many
> new
> >>> incoming files need to be eat. But sometimes many following
> >>> RetriesExhaustedException happen and the job is blocked for a long
> time. And
> >>> the the MapReduce job will be behindhand and cannot catch up with the
> speed
> >>> of incoming files.
> >>> >>
> >>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> >>> contact region server Some server for region
> TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row
> '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
> >>> >>
> >>> >> Exceptions:
> >>> >>
> >>> >>       at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> >>> >>       at
> >>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >>> >>       at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> >>> >>       at
> org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> >>> >>       at
> >>>
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
> >>> >>       at
> >>>
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
> >>> >>       at
> >>>
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
> >>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
> >>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
> >>> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> >>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >>> >>
> >>> >> I check the code and find the retry policy when this exception is
> like
> >>> BSD TCP syn backoff policy:
> >>> >>
> >>> >>   /**
> >>> >>    * This is a retry backoff multiplier table similar to the BSD TCP
> >>> syn
> >>> >>    * backoff table, a bit more aggressive than simple exponential
> >>> backoff.
> >>> >>    */
> >>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16,
> 32
> >>> };
> >>> >>
> >>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
> >>> following
> >>> >> tow parameters can be changed and I reduce them.
> >>> >> hbase.client.pause = 1 (second)
> >>> >> hbase.client.retries.number=4
> >>> >> Now, the block time should be 5 seconds.
> >>> >>
> >>> >> In fact, sometimes I have met such exception also when:
> >>> >> getRegionServerWithRetries, and
> >>> >> getRegionLocationForRowWithRetries, and
> >>> >> processBatchOfRows
> >>> >>
> >>> >> I want to know what cause above exception happen. Region spliting?
> >>> >> Compaction? or Region moving to other regionserver?
> >>> >> This exception happens very frequently in my cluster.
> >>> >>
> >>> >> Issue2: Also above workflow, I found the regionserver nodes are very
> >>> busy
> >>> >> wigh very heavy load.
> >>> >> The process of HRegionServer takes almost all of the used CPU load.
> >>> >> In my previous test round, I ran 4 maptask to insert data in each
> node,
> >>> now
> >>> >> I just run 2 maptask in each node. I think the performance of
> >>> HRegionServer
> >>> >> is not so good to achieve some of my applications. Because except
> for
> >>> the
> >>> >> insert/input data procedure, we will query rows from the same table
> for
> >>> >> online applications, which need low-latency. I my test experience, I
> >>> find
> >>> >> the online query (scan about 50 rows with startRow and endRow)
> latency
> >>> is
> >>> >> 2-6 seconds for the first query, and about 500ms for the re-query.
> And
> >>> when
> >>> >> the cluster is busy for inserting data to table, the query latency
> may
> >>> as
> >>> >> long as 10-30 seconds. And when the table is in sliping/compaction,
> >>> maybe
> >>> >> longer.
> >>> >>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >>
> >
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Schubert.

Regards your previous email, I didn't notice something obvious. You
should keep the retries at 10 since you could easily have to wait more
than 10 seconds under heavy load.

How many CPUs do you have on your Dell 2950?

Regards your last email, HBase 0.19 isn't compatible with Hadoop 0.18.
Why do you say it's buggy?

Thx,

J-D

On Tue, Mar 17, 2009 at 9:33 PM, schubert zhang <zs...@gmail.com> wrote:
> Hi Jean Daniel,
> I want to try the HBase0.19.1.
> And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a stable
> hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?
>
> Schubert
>
> On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com> wrote:
>
>> Jean Daniel,
>> Thank you very much and I will re-setup my cluster with HBase 0.19.1
>> immediately.
>> And I will change my swappiness configuration immediately.
>>
>> For the RetriesExhaustedException exception, I double checked the logs of
>> each node.
>>
>> for example one such exception occur at 09/03/17 16:29:10 for Region:
>> 13975557938@2009-03-17 13:42:36.412
>>
>> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
>> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
>> attempt_200903171247_0387_m_000009_0, Status : FAILED
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
>> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007, row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
>> Exceptions:
>>
>>         at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>         at
>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>         at
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>         at
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>         at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
>>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>
>> HMaster log:
>> 2009-03-17 16:29:16,501 INFO org.apache.hadoop.hbase.master.ServerManager:
>> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
>> 10.24.1.20:60020
>> 2009-03-17 16:29:18,707 INFO org.apache.hadoop.hbase.master.RegionManager:
>> assigning region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 to
>> server 10.24.1.14:60020
>> 2009-03-17 16:29:19,795 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
>> startKey: <>, server: 10.24.1.12:60020}
>> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.metaScanner scan of 34 row(s) of meta region {regionname:
>> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
>> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> All 1 .META. region(s) scanned
>> 2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
>> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
>> 10.24.1.14:60020
>> 2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
>> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
>> 10.24.1.14:60020
>> 2009-03-17 16:29:21,723 INFO
>> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
>> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
>> 10.24.1.14:60020
>> 2009-03-17 16:29:21,723 INFO
>> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
>> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region .META.,,1
>> with startcode 1237274685627 and server 10.24.1.14:60020
>>
>> one node:
>> 2009-03-17 16:28:43,617 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>> 2009-03-17 16:28:43,617 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
>> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
>> 2009-03-17 16:28:44,604 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007/168627948
>> available
>> 2009-03-17 16:29:01,682 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> Closed
>> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
>> entries=100008. New log writer:
>> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
>> 2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558 in 28sec
>> 2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> starting  compaction on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
>> 2009-03-17 16:29:04,927 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714 in 0sec
>>
>> another node:
>> 2009-03-17 16:28:56,462 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> starting  compaction on region CDR,13975557938@2009-03-1713:42:36.412,1237270434007
>> 2009-03-17 16:28:58,304 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>> 2009-03-17 16:28:58,305 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>> 2009-03-17 16:28:58,305 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
>> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
>> 2009-03-17 16:28:58,343 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> region CDR,13775569214@2009-03-17 13:13:57.898,1237278535949/1626944070
>> available
>> 2009-03-17 16:28:58,343 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
>> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
>> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new decompressor
>> 2009-03-17 16:28:58,388 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> region CDR,13775545668@2009-03-17 13:03:35.830,1237278535949/1656930599
>> available
>> 2009-03-17 16:29:01,325 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
>> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
>>
>> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> Schubert,
>>>
>>> Yeah latency was always an issue in HBase since reliability was our
>>> focus. This is going to be fixed in 0.20 with a new file format and
>>> caching of hot cells.
>>>
>>> Regards your problem, please enable DEBUG and look in your
>>> regionserver logs to see what's happening. Usually, if you get a
>>> RetriesExhausted, it's because something is taking too long to get a
>>> region reassigned.
>>>
>>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
>>> 2 fix for massive uploads like yours.
>>>
>>> I also see that there was some swap, please try setting the swappiness
>>> to something like 20. It helped me a lot!
>>> http://www.sollers.ca/blog/2008/swappiness/
>>>
>>> J-D
>>>
>>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>> > This the "top" info of a regionserver/datanode/tasktracker node.We can
>>> see
>>> > the HRegionServer node is very heavy-loaded.
>>> >
>>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
>>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
>>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,  1.2%si,
>>> >  0.0%st
>>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k buffers
>>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k cached
>>> >
>>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> >
>>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
>>> > (HRegionServer)
>>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
>>> (DataNode)
>>> >
>>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
>>> (MapReduce
>>> > Child)
>>> >
>>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I am running a MapReduce Job to read files and insert rows into a HBase
>>> table. It like a ETL procedure in database world.
>>> >>
>>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server
>>> with 4GB memory and 1TB disk on each node.
>>> >>
>>> >> Issue 1.
>>> >>
>>> >> Each time the MapReduce Job eat some files. And I want to get a balance
>>> in this pipe-line (I have defined the HBase table with TTL) since many new
>>> incoming files need to be eat. But sometimes many following
>>> RetriesExhaustedException happen and the job is blocked for a long time. And
>>> the the MapReduce job will be behindhand and cannot catch up with the speed
>>> of incoming files.
>>> >>
>>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> contact region server Some server for region TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
>>> >>
>>> >> Exceptions:
>>> >>
>>> >>       at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>> >>       at
>>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>> >>       at
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>> >>       at
>>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>> >>       at
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>> >>
>>> >> I check the code and find the retry policy when this exception is like
>>> BSD TCP syn backoff policy:
>>> >>
>>> >>   /**
>>> >>    * This is a retry backoff multiplier table similar to the BSD TCP
>>> syn
>>> >>    * backoff table, a bit more aggressive than simple exponential
>>> backoff.
>>> >>    */
>>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32
>>> };
>>> >>
>>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
>>> following
>>> >> tow parameters can be changed and I reduce them.
>>> >> hbase.client.pause = 1 (second)
>>> >> hbase.client.retries.number=4
>>> >> Now, the block time should be 5 seconds.
>>> >>
>>> >> In fact, sometimes I have met such exception also when:
>>> >> getRegionServerWithRetries, and
>>> >> getRegionLocationForRowWithRetries, and
>>> >> processBatchOfRows
>>> >>
>>> >> I want to know what cause above exception happen. Region spliting?
>>> >> Compaction? or Region moving to other regionserver?
>>> >> This exception happens very frequently in my cluster.
>>> >>
>>> >> Issue2: Also above workflow, I found the regionserver nodes are very
>>> busy
>>> >> wigh very heavy load.
>>> >> The process of HRegionServer takes almost all of the used CPU load.
>>> >> In my previous test round, I ran 4 maptask to insert data in each node,
>>> now
>>> >> I just run 2 maptask in each node. I think the performance of
>>> HRegionServer
>>> >> is not so good to achieve some of my applications. Because except for
>>> the
>>> >> insert/input data procedure, we will query rows from the same table for
>>> >> online applications, which need low-latency. I my test experience, I
>>> find
>>> >> the online query (scan about 50 rows with startRow and endRow) latency
>>> is
>>> >> 2-6 seconds for the first query, and about 500ms for the re-query. And
>>> when
>>> >> the cluster is busy for inserting data to table, the query latency may
>>> as
>>> >> long as 10-30 seconds. And when the table is in sliping/compaction,
>>> maybe
>>> >> longer.
>>> >>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
Hi Jean Daniel,
I want to try the HBase0.19.1.
And since the mapreduce of hadoop-0.19.1 is buggy, I want to use a stable
hadoop 0.18.3, is the HBase 0.19.x can work on hadoop 0.18.3?

Schubert

On Wed, Mar 18, 2009 at 12:26 AM, schubert zhang <zs...@gmail.com> wrote:

> Jean Daniel,
> Thank you very much and I will re-setup my cluster with HBase 0.19.1
> immediately.
> And I will change my swappiness configuration immediately.
>
> For the RetriesExhaustedException exception, I double checked the logs of
> each node.
>
> for example one such exception occur at 09/03/17 16:29:10 for Region:
> 13975557938@2009-03-17 13:42:36.412
>
> 09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
> 09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
> attempt_200903171247_0387_m_000009_0, Status : FAILED
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region TABA,13975557938@2009-03-1713:42:36.412,1237270434007, row '13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
> Exceptions:
>
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>         at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>         at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>         at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>         at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>         at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
>         at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
> HMaster log:
> 2009-03-17 16:29:16,501 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received MSG_REPORT_CLOSE: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
> 10.24.1.20:60020
> 2009-03-17 16:29:18,707 INFO org.apache.hadoop.hbase.master.RegionManager:
> assigning region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 to
> server 10.24.1.14:60020
> 2009-03-17 16:29:19,795 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
> startKey: <>, server: 10.24.1.12:60020}
> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 34 row(s) of meta region {regionname:
> .META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
> 2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
> All 1 .META. region(s) scanned
> 2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received MSG_REPORT_PROCESS_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
> 10.24.1.14:60020
> 2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received MSG_REPORT_OPEN: TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
> 10.24.1.14:60020
> 2009-03-17 16:29:21,723 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
> 10.24.1.14:60020
> 2009-03-17 16:29:21,723 INFO
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region .META.,,1
> with startcode 1237274685627 and server 10.24.1.14:60020
>
> one node:
> 2009-03-17 16:28:43,617 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
> 2009-03-17 16:28:43,617 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
> 2009-03-17 16:28:44,604 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007/168627948
> available
> 2009-03-17 16:29:01,682 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
> entries=100008. New log writer:
> /hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
> 2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region TABA,13675519693@2009-03-1713:39:35.123,1237270466558 in 28sec
> 2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714
> 2009-03-17 16:29:04,927 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region TABA,13702099018@2009-03-1714:00:04.791,1237270570714 in 0sec
>
> another node:
> 2009-03-17 16:28:56,462 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting  compaction on region CDR,13975557938@2009-03-1713:42:36.412,1237270434007
> 2009-03-17 16:28:58,304 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
> 2009-03-17 16:28:58,305 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
> 2009-03-17 16:28:58,305 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
> 2009-03-17 16:28:58,343 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region CDR,13775569214@2009-03-17 13:13:57.898,1237278535949/1626944070
> available
> 2009-03-17 16:28:58,343 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new decompressor
> 2009-03-17 16:28:58,388 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region CDR,13775545668@2009-03-17 13:03:35.830,1237278535949/1656930599
> available
> 2009-03-17 16:29:01,325 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded
>
> On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Schubert,
>>
>> Yeah latency was always an issue in HBase since reliability was our
>> focus. This is going to be fixed in 0.20 with a new file format and
>> caching of hot cells.
>>
>> Regards your problem, please enable DEBUG and look in your
>> regionserver logs to see what's happening. Usually, if you get a
>> RetriesExhausted, it's because something is taking too long to get a
>> region reassigned.
>>
>> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
>> 2 fix for massive uploads like yours.
>>
>> I also see that there was some swap, please try setting the swappiness
>> to something like 20. It helped me a lot!
>> http://www.sollers.ca/blog/2008/swappiness/
>>
>> J-D
>>
>> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com>
>> wrote:
>> > This the "top" info of a regionserver/datanode/tasktracker node.We can
>> see
>> > the HRegionServer node is very heavy-loaded.
>> >
>> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
>> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
>> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,  1.2%si,
>> >  0.0%st
>> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k buffers
>> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k cached
>> >
>> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> >
>> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
>> > (HRegionServer)
>> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
>> (DataNode)
>> >
>> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
>> (MapReduce
>> > Child)
>> >
>> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I am running a MapReduce Job to read files and insert rows into a HBase
>> table. It like a ETL procedure in database world.
>> >>
>> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server
>> with 4GB memory and 1TB disk on each node.
>> >>
>> >> Issue 1.
>> >>
>> >> Each time the MapReduce Job eat some files. And I want to get a balance
>> in this pipe-line (I have defined the HBase table with TTL) since many new
>> incoming files need to be eat. But sometimes many following
>> RetriesExhaustedException happen and the job is blocked for a long time. And
>> the the MapReduce job will be behindhand and cannot catch up with the speed
>> of incoming files.
>> >>
>> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact region server Some server for region TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
>> >>
>> >> Exceptions:
>> >>
>> >>       at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>> >>       at
>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>> >>       at
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>> >>       at
>> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>> >>       at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>> >>
>> >> I check the code and find the retry policy when this exception is like
>> BSD TCP syn backoff policy:
>> >>
>> >>   /**
>> >>    * This is a retry backoff multiplier table similar to the BSD TCP
>> syn
>> >>    * backoff table, a bit more aggressive than simple exponential
>> backoff.
>> >>    */
>> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32
>> };
>> >>
>> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
>> following
>> >> tow parameters can be changed and I reduce them.
>> >> hbase.client.pause = 1 (second)
>> >> hbase.client.retries.number=4
>> >> Now, the block time should be 5 seconds.
>> >>
>> >> In fact, sometimes I have met such exception also when:
>> >> getRegionServerWithRetries, and
>> >> getRegionLocationForRowWithRetries, and
>> >> processBatchOfRows
>> >>
>> >> I want to know what cause above exception happen. Region spliting?
>> >> Compaction? or Region moving to other regionserver?
>> >> This exception happens very frequently in my cluster.
>> >>
>> >> Issue2: Also above workflow, I found the regionserver nodes are very
>> busy
>> >> wigh very heavy load.
>> >> The process of HRegionServer takes almost all of the used CPU load.
>> >> In my previous test round, I ran 4 maptask to insert data in each node,
>> now
>> >> I just run 2 maptask in each node. I think the performance of
>> HRegionServer
>> >> is not so good to achieve some of my applications. Because except for
>> the
>> >> insert/input data procedure, we will query rows from the same table for
>> >> online applications, which need low-latency. I my test experience, I
>> find
>> >> the online query (scan about 50 rows with startRow and endRow) latency
>> is
>> >> 2-6 seconds for the first query, and about 500ms for the re-query. And
>> when
>> >> the cluster is busy for inserting data to table, the query latency may
>> as
>> >> long as 10-30 seconds. And when the table is in sliping/compaction,
>> maybe
>> >> longer.
>> >>
>> >>
>> >>
>> >
>>
>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
Jean Daniel,
Thank you very much and I will re-setup my cluster with HBase 0.19.1
immediately.
And I will change my swappiness configuration immediately.

For the RetriesExhaustedException exception, I double checked the logs of
each node.

for example one such exception occur at 09/03/17 16:29:10 for Region:
13975557938@2009-03-17 13:42:36.412

09/03/17 16:29:10 INFO mapred.JobClient:  map 69% reduce 0%
09/03/17 16:29:10 INFO mapred.JobClient: Task Id :
attempt_200903171247_0387_m_000009_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region
TABA,13975557938@2009-03-1713:42:36.412,1237270434007, row
'13975559447@2009-03-1716:25:42.772', but failed after 4 attempts.
Exceptions:

        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
        at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
        at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
        at
net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
        at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:123)
        at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)

HMaster log:
2009-03-17 16:29:16,501 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_CLOSE:
TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
10.24.1.20:60020
2009-03-17 16:29:18,707 INFO org.apache.hadoop.hbase.master.RegionManager:
assigning region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 to
server 10.24.1.14:60020
2009-03-17 16:29:19,795 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.24.1.12:60020}
2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 34 row(s) of meta region {regionname:
.META.,,1, startKey: <>, server: 10.24.1.12:60020} complete
2009-03-17 16:29:19,857 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_PROCESS_OPEN:
TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
10.24.1.14:60020
2009-03-17 16:29:21,723 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN:
TABA,13975557938@2009-03-1713:42:36.412,1237270434007 from
10.24.1.14:60020
2009-03-17 16:29:21,723 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 open on
10.24.1.14:60020
2009-03-17 16:29:21,723 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
TABA,13975557938@2009-03-17 13:42:36.412,1237270434007 in region .META.,,1
with startcode 1237274685627 and server 10.24.1.14:60020

one node:
2009-03-17 16:28:43,617 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
2009-03-17 16:28:43,617 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
TABA,13975557938@2009-03-17 13:42:36.412,1237270434007
2009-03-17 16:28:44,604 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region TABA,13975557938@2009-03-17 13:42:36.412,1237270434007/168627948
available
2009-03-17 16:29:01,682 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://nd0-rack0-cloud:9000/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278496283,
entries=100008. New log writer:
/hbase/log_10.24.1.14_1237274685627_60020/hlog.dat.1237278541680
2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
TABA,13675519693@2009-03-1713:39:35.123,1237270466558 in 28sec
2009-03-17 16:29:04,920 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting  compaction on region
TABA,13702099018@2009-03-1714:00:04.791,1237270570714
2009-03-17 16:29:04,927 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
TABA,13702099018@2009-03-1714:00:04.791,1237270570714 in 0sec

another node:
2009-03-17 16:28:56,462 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting  compaction on region
CDR,13975557938@2009-03-1713:42:36.412,1237270434007
2009-03-17 16:28:58,304 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
2009-03-17 16:28:58,305 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
2009-03-17 16:28:58,305 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
CDR,13775569214@2009-03-17 13:13:57.898,1237278535949
2009-03-17 16:28:58,343 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region CDR,13775569214@2009-03-17 13:13:57.898,1237278535949/1626944070
available
2009-03-17 16:28:58,343 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
CDR,13775545668@2009-03-17 13:03:35.830,1237278535949
2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,380 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,383 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,384 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,386 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,387 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2009-03-17 16:28:58,388 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region CDR,13775545668@2009-03-17 13:03:35.830,1237278535949/1656930599
available
2009-03-17 16:29:01,325 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
CDR,13975557938@2009-03-17 13:42:36.412,1237270434007: Overloaded

On Tue, Mar 17, 2009 at 8:02 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Schubert,
>
> Yeah latency was always an issue in HBase since reliability was our
> focus. This is going to be fixed in 0.20 with a new file format and
> caching of hot cells.
>
> Regards your problem, please enable DEBUG and look in your
> regionserver logs to see what's happening. Usually, if you get a
> RetriesExhausted, it's because something is taking too long to get a
> region reassigned.
>
> I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
> 2 fix for massive uploads like yours.
>
> I also see that there was some swap, please try setting the swappiness
> to something like 20. It helped me a lot!
> http://www.sollers.ca/blog/2008/swappiness/
>
> J-D
>
> On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com> wrote:
> > This the "top" info of a regionserver/datanode/tasktracker node.We can
> see
> > the HRegionServer node is very heavy-loaded.
> >
> > top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
> > Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,  1.2%si,
> >  0.0%st
> > Mem:   4043484k total,  2816028k used,  1227456k free,    18308k buffers
> > Swap:  2097144k total,    22944k used,  2074200k free,  1601760k cached
> >
> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >
> > 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
> > (HRegionServer)
> > 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java
> (DataNode)
> >
> > 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java
> (MapReduce
> > Child)
> >
> > On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> I am running a MapReduce Job to read files and insert rows into a HBase
> table. It like a ETL procedure in database world.
> >>
> >> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server
> with 4GB memory and 1TB disk on each node.
> >>
> >> Issue 1.
> >>
> >> Each time the MapReduce Job eat some files. And I want to get a balance
> in this pipe-line (I have defined the HBase table with TTL) since many new
> incoming files need to be eat. But sometimes many following
> RetriesExhaustedException happen and the job is blocked for a long time. And
> the the MapReduce job will be behindhand and cannot catch up with the speed
> of incoming files.
> >>
> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server Some server for region TABA,113976305209@2009-03-1713:11:04.135,1237268430402, row '113976399906@2009-03-1713:40:12.213', but failed after 10 attempts.
> >>
> >> Exceptions:
> >>
> >>       at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> >>       at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> >>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> >>       at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
> >>       at
> net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
> >>       at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
> >>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
> >>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> >>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >>
> >> I check the code and find the retry policy when this exception is like
> BSD TCP syn backoff policy:
> >>
> >>   /**
> >>    * This is a retry backoff multiplier table similar to the BSD TCP syn
> >>    * backoff table, a bit more aggressive than simple exponential
> backoff.
> >>    */
> >>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32
> };
> >>
> >> So, 10 retries will table long time: 71*2=142 seconds. So I think
> following
> >> tow parameters can be changed and I reduce them.
> >> hbase.client.pause = 1 (second)
> >> hbase.client.retries.number=4
> >> Now, the block time should be 5 seconds.
> >>
> >> In fact, sometimes I have met such exception also when:
> >> getRegionServerWithRetries, and
> >> getRegionLocationForRowWithRetries, and
> >> processBatchOfRows
> >>
> >> I want to know what cause above exception happen. Region spliting?
> >> Compaction? or Region moving to other regionserver?
> >> This exception happens very frequently in my cluster.
> >>
> >> Issue2: Also above workflow, I found the regionserver nodes are very
> busy
> >> wigh very heavy load.
> >> The process of HRegionServer takes almost all of the used CPU load.
> >> In my previous test round, I ran 4 maptask to insert data in each node,
> now
> >> I just run 2 maptask in each node. I think the performance of
> HRegionServer
> >> is not so good to achieve some of my applications. Because except for
> the
> >> insert/input data procedure, we will query rows from the same table for
> >> online applications, which need low-latency. I my test experience, I
> find
> >> the online query (scan about 50 rows with startRow and endRow) latency
> is
> >> 2-6 seconds for the first query, and about 500ms for the re-query. And
> when
> >> the cluster is busy for inserting data to table, the query latency may
> as
> >> long as 10-30 seconds. And when the table is in sliping/compaction,
> maybe
> >> longer.
> >>
> >>
> >>
> >
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Schubert,

Yeah latency was always an issue in HBase since reliability was our
focus. This is going to be fixed in 0.20 with a new file format and
caching of hot cells.

Regards your problem, please enable DEBUG and look in your
regionserver logs to see what's happening. Usually, if you get a
RetriesExhausted, it's because something is taking too long to get a
region reassigned.

I also suggest that you upgrade to HBase 0.19.1 candidate 2 which has
2 fix for massive uploads like yours.

I also see that there was some swap, please try setting the swappiness
to something like 20. It helped me a lot!
http://www.sollers.ca/blog/2008/swappiness/

J-D

On Tue, Mar 17, 2009 at 3:49 AM, schubert zhang <zs...@gmail.com> wrote:
> This the "top" info of a regionserver/datanode/tasktracker node.We can see
> the HRegionServer node is very heavy-loaded.
>
> top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
> Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
> Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,  1.2%si,
>  0.0%st
> Mem:   4043484k total,  2816028k used,  1227456k free,    18308k buffers
> Swap:  2097144k total,    22944k used,  2074200k free,  1601760k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
> (HRegionServer)
> 20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java (DataNode)
>
> 22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java (MapReduce
> Child)
>
> On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am running a MapReduce Job to read files and insert rows into a HBase table. It like a ETL procedure in database world.
>>
>> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server with 4GB memory and 1TB disk on each node.
>>
>> Issue 1.
>>
>> Each time the MapReduce Job eat some files. And I want to get a balance in this pipe-line (I have defined the HBase table with TTL) since many new incoming files need to be eat. But sometimes many following RetriesExhaustedException happen and the job is blocked for a long time. And the the MapReduce job will be behindhand and cannot catch up with the speed of incoming files.
>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region TABA,113976305209@2009-03-17 13:11:04.135,1237268430402, row '113976399906@2009-03-17 13:40:12.213', but failed after 10 attempts.
>>
>> Exceptions:
>>
>>       at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>       at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
>>       at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
>>       at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
>>       at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
>>       at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
>>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
>>       at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
>>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>
>> I check the code and find the retry policy when this exception is like BSD TCP syn backoff policy:
>>
>>   /**
>>    * This is a retry backoff multiplier table similar to the BSD TCP syn
>>    * backoff table, a bit more aggressive than simple exponential backoff.
>>    */
>>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32 };
>>
>> So, 10 retries will table long time: 71*2=142 seconds. So I think following
>> tow parameters can be changed and I reduce them.
>> hbase.client.pause = 1 (second)
>> hbase.client.retries.number=4
>> Now, the block time should be 5 seconds.
>>
>> In fact, sometimes I have met such exception also when:
>> getRegionServerWithRetries, and
>> getRegionLocationForRowWithRetries, and
>> processBatchOfRows
>>
>> I want to know what cause above exception happen. Region spliting?
>> Compaction? or Region moving to other regionserver?
>> This exception happens very frequently in my cluster.
>>
>> Issue2: Also above workflow, I found the regionserver nodes are very busy
>> wigh very heavy load.
>> The process of HRegionServer takes almost all of the used CPU load.
>> In my previous test round, I ran 4 maptask to insert data in each node, now
>> I just run 2 maptask in each node. I think the performance of HRegionServer
>> is not so good to achieve some of my applications. Because except for the
>> insert/input data procedure, we will query rows from the same table for
>> online applications, which need low-latency. I my test experience, I find
>> the online query (scan about 50 rows with startRow and endRow) latency is
>> 2-6 seconds for the first query, and about 500ms for the re-query. And when
>> the cluster is busy for inserting data to table, the query latency may as
>> long as 10-30 seconds. And when the table is in sliping/compaction, maybe
>> longer.
>>
>>
>>
>

Re: Tow issues for sharing, when using MapReduce to insert rows to HBase

Posted by schubert zhang <zs...@gmail.com>.
This the "top" info of a regionserver/datanode/tasktracker node.We can see
the HRegionServer node is very heavy-loaded.

top - 15:44:17 up 23:46,  3 users,  load average: 1.72, 0.55, 0.23
Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
Cpu(s): 56.2%us,  3.1%sy,  0.0%ni, 26.6%id, 12.7%wa,  0.2%hi,  1.2%si,
 0.0%st
Mem:   4043484k total,  2816028k used,  1227456k free,    18308k buffers
Swap:  2097144k total,    22944k used,  2074200k free,  1601760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

21797 schubert  24   0 3414m 796m 9848 S  218 20.2   2:55.21 java
(HRegionServer)
20545 schubert  24   0 3461m  98m 9356 S   15  2.5  10:49.64 java (DataNode)

22677 schubert  24   0 1417m 107m 9224 S    9  2.7   0:08.86 java (MapReduce
Child)

On Tue, Mar 17, 2009 at 3:38 PM, schubert zhang <zs...@gmail.com> wrote:

> Hi all,
>
> I am running a MapReduce Job to read files and insert rows into a HBase table. It like a ETL procedure in database world.
>
> Hadoop 0.19.1, HBase 0.19.0, 5 slaves and 1 master,  DELL2950 server with 4GB memory and 1TB disk on each node.
>
> Issue 1.
>
> Each time the MapReduce Job eat some files. And I want to get a balance in this pipe-line (I have defined the HBase table with TTL) since many new incoming files need to be eat. But sometimes many following RetriesExhaustedException happen and the job is blocked for a long time. And the the MapReduce job will be behindhand and cannot catch up with the speed of incoming files.
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server for region TABA,113976305209@2009-03-17 13:11:04.135,1237268430402, row '113976399906@2009-03-17 13:40:12.213', but failed after 10 attempts.
>
> Exceptions:
>
> 	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> 	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> 	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
> 	at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
> 	at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:21)
> 	at net.sandmill.control.TableRowRecordWriter.write(TableRowRecordWriter.java:11)
> 	at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:395)
> 	at net.sandmill.control.IndexingJob.map(IndexingJob.java:117)
> 	at net.sandmill.control.IndexingJob.map(IndexingJob.java:27)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
> I check the code and find the retry policy when this exception is like BSD TCP syn backoff policy:
>
>   /**
>    * This is a retry backoff multiplier table similar to the BSD TCP syn
>    * backoff table, a bit more aggressive than simple exponential backoff.
>    */
>   public static int RETRY_BACKOFF[] = { 1, 1, 1, 2, 2, 4, 4, 8, 16, 32 };
>
> So, 10 retries will table long time: 71*2=142 seconds. So I think following
> tow parameters can be changed and I reduce them.
> hbase.client.pause = 1 (second)
> hbase.client.retries.number=4
> Now, the block time should be 5 seconds.
>
> In fact, sometimes I have met such exception also when:
> getRegionServerWithRetries, and
> getRegionLocationForRowWithRetries, and
> processBatchOfRows
>
> I want to know what cause above exception happen. Region spliting?
> Compaction? or Region moving to other regionserver?
> This exception happens very frequently in my cluster.
>
> Issue2: Also above workflow, I found the regionserver nodes are very busy
> wigh very heavy load.
> The process of HRegionServer takes almost all of the used CPU load.
> In my previous test round, I ran 4 maptask to insert data in each node, now
> I just run 2 maptask in each node. I think the performance of HRegionServer
> is not so good to achieve some of my applications. Because except for the
> insert/input data procedure, we will query rows from the same table for
> online applications, which need low-latency. I my test experience, I find
> the online query (scan about 50 rows with startRow and endRow) latency is
> 2-6 seconds for the first query, and about 500ms for the re-query. And when
> the cluster is busy for inserting data to table, the query latency may as
> long as 10-30 seconds. And when the table is in sliping/compaction, maybe
> longer.
>
>
>