You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Vikas Vishwakarma (JIRA)" <ji...@apache.org> on 2014/08/28 06:45:57 UTC
[jira] [Created] (PHOENIX-1215) some transactions are hanging indefinitely on

Vikas Vishwakarma created PHOENIX-1215:
------------------------------------------

             Summary: some transactions are hanging indefinitely on 
                 Key: PHOENIX-1215
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1215
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.0.0
         Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
MemTotal:       65801264 kB, 1 Gbps Network
Red Hat Enterprise Linux Server release 6.2

            Reporter: Vikas Vishwakarma


Test Setup:
==========
4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. 
Each load client spawns a thread and runs a upsert query in batch mode

Pseudo code:
==========
SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES (?, ?)

    for (int i=0; i<6400; i++) {
      upsertSql.append(SQL_UPSERT_HBASE_DATA);
      ps = conn.prepareStatement(preparedSqlTableStmt.toString());
      ps.setString(1, rowKey);
      ps.setString(2, rowVal);
      ps.execute();
    }
    conn.commit();
    tryClose(conn);

Observation:
==============
Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. 

I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout().

Log Analysis:
=============
On the client side there are no errors or exceptions except some threads in hung state 

On the service side, the observation is as follows:
Also after some log analysis,  it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded.

RegionServer log snapshots during the time period this issue was observed on the client side:
---------
2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully.
2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
....
2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1
2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully.
....
2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1
2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully.
...
2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
..
2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished scanning 0 rows for ungrouped coprocessor scan {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}



--
This message was sent by Atlassian JIRA
(v6.2#6252)