You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Vikas Vishwakarma (JIRA)" <ji...@apache.org> on 2014/08/28 07:02:58 UTC

[jira] [Updated] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on

     [ https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vikas Vishwakarma updated PHOENIX-1215:
---------------------------------------

    Summary: some phoenix transactions are hanging indefinitely on   (was: some transactions are hanging indefinitely on )

> some phoenix transactions are hanging indefinitely on 
> ------------------------------------------------------
>
>                 Key: PHOENIX-1215
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1215
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
> MemTotal:       65801264 kB, 1 Gbps Network
> Red Hat Enterprise Linux Server release 6.2
>            Reporter: Vikas Vishwakarma
>
> Test Setup:
> ==========
> 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. 
> Each load client spawns a thread and runs a upsert query in batch mode
> Pseudo code:
> ==========
> SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES (?, ?)
>     for (int i=0; i<6400; i++) {
>       upsertSql.append(SQL_UPSERT_HBASE_DATA);
>       ps = conn.prepareStatement(preparedSqlTableStmt.toString());
>       ps.setString(1, rowKey);
>       ps.setString(2, rowVal);
>       ps.execute();
>     }
>     conn.commit();
>     tryClose(conn);
> Observation:
> ==============
> Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. 
> I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout().
> Log Analysis:
> =============
> On the client side there are no errors or exceptions except some threads in hung state 
> On the service side, the observation is as follows:
> Also after some log analysis,  it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded.
> RegionServer log snapshots during the time period this issue was observed on the client side:
> ---------
> 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully.
> 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
> 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> ....
> 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1
> 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully.
> ....
> 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1
> 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully.
> ...
> 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
> 2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now.
> ..
> 2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> 2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished scanning 0 rows for ungrouped coprocessor scan {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}



--
This message was sent by Atlassian JIRA
(v6.2#6252)