You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:15:43 UTC
[jira] [Created] (TRAFODION-347) LP Bug: 1326127 - Large query sees
hbase ScannerTimeoutException at ExpHbaseInterface::fetchNextRow()
Alice Chen created TRAFODION-347:
------------------------------------
Summary: LP Bug: 1326127 - Large query sees hbase ScannerTimeoutException at ExpHbaseInterface::fetchNextRow()
Key: TRAFODION-347
URL: https://issues.apache.org/jira/browse/TRAFODION-347
Project: Apache Trafodion
Issue Type: Bug
Components: sql-exe
Reporter: Weishiun Tsai
Priority: Critical
With the current default hbase settings, a larger query often returns ScannerTimeoutException and fails to complete. Here is an example of running such query on a 6-node machine using QA’s g_tpch2x tables.
This case is to document this problem, but the solution may be a combination of many possible fixes or tuning. It could be to come up with a recommendation of better hbase attribute settings, to deal with such timeout error internally from Trafodion’s org.trafodion.sql.HBaseAccess.HTableClient.fetchNextRow(), to come up with a better query plan, to improve the performance of the executor, or some other possible ways of improving the situation.
----------------------------------------------------------------
Here is the entire script to reproduce this problem (it requires g_tpch2x tables to be loaded first)
select count(*) from trafodion.g_tpch2x.part;
select count(*) From trafodion.g_tpch2x.supplier;
select count(*) from trafodion.g_tpch2x.partsupp;
select count(*) from trafodion.g_tpch2x.lineitem;
prepare xx from
select * from trafodion.g_tpch2x.part p1, trafodion.g_tpch2x.supplier s1, trafodion.g_tpch2x.partsupp, trafodion.g_tpch2x.part p2, trafodion.g_tpch2x.supplier s2, trafodion.g_tpch2x.lineitem
where p1.p_name='azure peach dodger frosted lavender'
and s1.s_name='Supplier#000000123'
and p1.p_partkey=ps_partkey and ps_partkey=p2.p_partkey and p2.p_partkey=l_partkey
and s1.s_suppkey=ps_suppkey and ps_suppkey=s2.s_suppkey and s2.s_suppkey=l_suppkey;
explain options 'f' xx;
sh date;
execute xx;
sh date;
----------------------------------------------------------------
Here is the execution output with the error:
>>select count(*) from trafodion.g_tpch2x.part;
(EXPR)
--------------------
400000
--- 1 row(s) selected.
>>select count(*) From trafodion.g_tpch2x.supplier;
(EXPR)
--------------------
20000
--- 1 row(s) selected.
>>select count(*) from trafodion.g_tpch2x.partsupp;
(EXPR)
--------------------
1600000
--- 1 row(s) selected.
>>select count(*) from trafodion.g_tpch2x.lineitem;
(EXPR)
--------------------
11997996
--- 1 row(s) selected.
>>prepare xx from
+>select * from trafodion.g_tpch2x.part p1, trafodion.g_tpch2x.supplier s1, trafodion.g_tpch2x.partsupp, trafodion.g_tpch2x.part p2, trafodion.g_tpch2x.supplier s2, trafodion.g_tpch2x.lineitem
+>where p1.p_name='azure peach dodger frosted lavender'
+>and s1.s_name='Supplier#000000123'
+>and p1.p_partkey=ps_partkey and ps_partkey=p2.p_partkey and p2.p_partkey=l_partkey
+>and s1.s_suppkey=ps_suppkey and ps_suppkey=s2.s_suppkey and s2.s_suppkey=l_suppkey;
*** WARNING[6007] Multi-column statistics for columns (PS_PARTKEY, PS_SUPPKEY) from table TRAFODION.G_TPCH2X.PARTSUPP were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.
*** WARNING[6007] Multi-column statistics for columns (L_PARTKEY, L_SUPPKEY) from table TRAFODION.G_TPCH2X.LINEITEM were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.
--- SQL command prepared.
>>
>>explain options 'f' xx;
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------
16 . 17 root 2.50E+001
15 1 16 hybrid_hash_join 2.50E+001
14 2 15 hybrid_hash_join 1.00E+000
13 . 14 esp_exchange 1:6(hash2) 4.00E+000
12 4 13 hybrid_hash_join 4.00E+000
11 . 12 esp_exchange 6(hash2):6(hash2) 4.00E+000
10 6 11 hybrid_hash_join 4.00E+000
8 9 10 nested_join 4.00E+000
. . 9 trafodion_scan PARTSUPP 4.00E+000
7 . 8 esp_exchange 6(hash2):6(range) 1.00E+000
. . 7 trafodion_scan PART 1.00E+000
5 . 6 esp_exchange 6(hash2):6(range) 4.00E+005
. . 5 trafodion_scan PART 4.00E+005
3 . 4 esp_exchange 6(hash2):6(range) 2.00E+004
. . 3 trafodion_scan SUPPLIER 2.00E+004
. . 2 trafodion_scan SUPPLIER 1.00E+000
. . 1 trafodion_scan LINEITEM 1.00E+002
--- SQL operation complete.
>>
>>sh date;
Tue Jun 3 11:03:28 PDT 2014
>>
>>execute xx;
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::fetchNextRow returned error HBASE_ACCESS_ERROR(-705). Cause:
org.apache.hadoop.hbase.client.ScannerTimeoutException: 89265ms passed since the last invocation, timeout is currently set to 60000
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:283)
org.trafodion.sql.HBaseAccess.HTableClient.fetchNextRow(HTableClient.java:352)
.
--- 0 row(s) selected.
>>
>>sh date;
Tue Jun 3 11:31:01 PDT 2014
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)