You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:20:24 UTC

[jira] [Created] (TRAFODION-1207) LP Bug: 1449190 - hbase split starvation due to transactions.

Alice Chen created TRAFODION-1207:
-------------------------------------

             Summary: LP Bug: 1449190 - hbase split starvation due to transactions.
                 Key: TRAFODION-1207
                 URL: https://issues.apache.org/jira/browse/TRAFODION-1207
             Project: Apache Trafodion
          Issue Type: Bug
          Components: dtm
            Reporter: Guy Groulx
            Assignee: Oliver Bucaojit
            Priority: Critical
             Fix For: 2.0-incubating


We ran a longevity test on a system.    Running OE with 512 drivers.
Our max hfile was set to 10GB.

After a while it was noticed in some of the hbase regionserver logs 
2015-04-27 10:35:06,990 INFO  [regionserver60020-splits-1430121725808] transactional.TrxRegionObserver: Delaying split due to transactions present. Delayed : 153 minute(s) on TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.a1e39f281243d24c45d615c1b950f2a8.
2015-04-27 10:35:13,926 INFO  [regionserver60020-splits-1430123472882] transactional.TrxRegionObserver: Delaying split due to transactions present. Delayed : 124 minute(s) on TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.a1e39f281243d24c45d615c1b950f2a8.

Looking at the hdfs GUI:
Contents of directory /apps/hbase/data/data/default/TRAFODION.JAVABENCH.OE_ORDERLINE_512/a1e39f281243d24c45d615c1b950f2a8/04ae7ce619d24b0094d85d5c39ebf8a6 	file	72.49 MB
559232b70b5340ddaa289a30dc4d7d2c	file	14.66 GB    <== This is over 14.66GB.
6c253a61ee344b1bb39d2f3a669103d3	file	72.56 MB
8837fc13d3a241d493b3ddcbd160d869	file	72.49 MB
901c5708daa048599de8d1441ed5ea89	file	72.48 MB
b9d4bb1179414f9686f1f3271a2b434b	file	72.56 MB
bbe2057994194a2693897bb5323a89fd	file	72.56 MB

Notice how the 2nd entry is over 10GB.   It can't split because we have active transactions.   And because our 512 drivers are not letting up, the split is starving out.

Once we killed the drivers, stopping new transactions, the split happened almost instantly.
Hall, Gary	winding down...
	1:48 PM
2015-04-27 17:48:57,235 INFO  [regionserver60020-splits-1430135811050] regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430066791987.2fb76848eeb9b1516ae7a80500e8870c., new regions: TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1430135811140.4cd059f95ecba06425a5c39592de4157., TRAFODION.JAVABENCH.OE_ORDERLINE_512,\x00\x00\x00\x0F\x80\x00\x00\xE8\x80\x00\x00\x05\x80\x00\x000\x80\x00\x00\x07,1430135811140.62d1d89a34607dfde0ec66d18ad6e91f.. Split took 5hrs, 52mins, 6sec

Above says 5hr 52 mins but it actually took less than a minute once the transactions stopped.

We understand that split must be delayed until transactions have stopped, but in a high transaction environments, we need to make sure that a window will be given for the splits to actually happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)