You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:15:43 UTC

[jira] [Created] (TRAFODION-347) LP Bug: 1326127 - Large query sees hbase ScannerTimeoutException at ExpHbaseInterface::fetchNextRow()

Alice Chen created TRAFODION-347:
------------------------------------

             Summary: LP Bug: 1326127 - Large query sees hbase ScannerTimeoutException at ExpHbaseInterface::fetchNextRow()
                 Key: TRAFODION-347
                 URL: https://issues.apache.org/jira/browse/TRAFODION-347
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-exe
            Reporter: Weishiun Tsai
            Priority: Critical


With the current default hbase settings, a larger query often returns ScannerTimeoutException and fails to complete.  Here is an example of running such query on a 6-node machine using QA’s g_tpch2x tables.

This case is to document this problem, but the solution may be a combination of many possible fixes or tuning.   It could be to come up with a recommendation of better hbase attribute settings, to deal with such timeout error internally from Trafodion’s org.trafodion.sql.HBaseAccess.HTableClient.fetchNextRow(), to come up with a better query plan, to improve the performance of the executor, or some other possible ways of improving the situation.

----------------------------------------------------------------

Here is the entire script to reproduce this problem (it requires g_tpch2x tables to be loaded first)

select count(*) from trafodion.g_tpch2x.part;
select count(*) From trafodion.g_tpch2x.supplier;
select count(*) from trafodion.g_tpch2x.partsupp;
select count(*) from trafodion.g_tpch2x.lineitem;

prepare xx from
select * from trafodion.g_tpch2x.part p1, trafodion.g_tpch2x.supplier s1, trafodion.g_tpch2x.partsupp, trafodion.g_tpch2x.part p2, trafodion.g_tpch2x.supplier s2, trafodion.g_tpch2x.lineitem
where p1.p_name='azure peach dodger frosted lavender'
and s1.s_name='Supplier#000000123'
and p1.p_partkey=ps_partkey and ps_partkey=p2.p_partkey and p2.p_partkey=l_partkey
and s1.s_suppkey=ps_suppkey and ps_suppkey=s2.s_suppkey and s2.s_suppkey=l_suppkey;

explain options 'f' xx;

sh date;

execute xx;

sh date;

----------------------------------------------------------------

Here is the execution output with the error:

>>select count(*) from trafodion.g_tpch2x.part;

(EXPR)
--------------------

              400000

--- 1 row(s) selected.
>>select count(*) From trafodion.g_tpch2x.supplier;

(EXPR)
--------------------

               20000

--- 1 row(s) selected.
>>select count(*) from trafodion.g_tpch2x.partsupp;

(EXPR)
--------------------

             1600000

--- 1 row(s) selected.
>>select count(*) from trafodion.g_tpch2x.lineitem;

(EXPR)
--------------------

            11997996

--- 1 row(s) selected.
>>prepare xx from
+>select * from trafodion.g_tpch2x.part p1, trafodion.g_tpch2x.supplier s1, trafodion.g_tpch2x.partsupp, trafodion.g_tpch2x.part p2, trafodion.g_tpch2x.supplier s2, trafodion.g_tpch2x.lineitem
+>where p1.p_name='azure peach dodger frosted lavender'
+>and s1.s_name='Supplier#000000123'
+>and p1.p_partkey=ps_partkey and ps_partkey=p2.p_partkey and p2.p_partkey=l_partkey
+>and s1.s_suppkey=ps_suppkey and ps_suppkey=s2.s_suppkey and s2.s_suppkey=l_suppkey;

*** WARNING[6007] Multi-column statistics for columns (PS_PARTKEY, PS_SUPPKEY) from table TRAFODION.G_TPCH2X.PARTSUPP were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

*** WARNING[6007] Multi-column statistics for columns (L_PARTKEY, L_SUPPKEY) from table TRAFODION.G_TPCH2X.LINEITEM were not available. The columns were being used by Join operator. As a result, the access path chosen might not be the best possible.

--- SQL command prepared.
>>
>>explain options 'f' xx;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

16   .    17   root                                                  2.50E+001
15   1    16   hybrid_hash_join                                      2.50E+001
14   2    15   hybrid_hash_join                                      1.00E+000
13   .    14   esp_exchange                    1:6(hash2)            4.00E+000
12   4    13   hybrid_hash_join                                      4.00E+000
11   .    12   esp_exchange                    6(hash2):6(hash2)     4.00E+000
10   6    11   hybrid_hash_join                                      4.00E+000
8    9    10   nested_join                                           4.00E+000
.    .    9    trafodion_scan                  PARTSUPP              4.00E+000
7    .    8    esp_exchange                    6(hash2):6(range)     1.00E+000
.    .    7    trafodion_scan                  PART                  1.00E+000
5    .    6    esp_exchange                    6(hash2):6(range)     4.00E+005
.    .    5    trafodion_scan                  PART                  4.00E+005
3    .    4    esp_exchange                    6(hash2):6(range)     2.00E+004
.    .    3    trafodion_scan                  SUPPLIER              2.00E+004
.    .    2    trafodion_scan                  SUPPLIER              1.00E+000
.    .    1    trafodion_scan                  LINEITEM              1.00E+002

--- SQL operation complete.
>>
>>sh date;
Tue Jun  3 11:03:28 PDT 2014
>>
>>execute xx;

*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::fetchNextRow returned error HBASE_ACCESS_ERROR(-705). Cause:
org.apache.hadoop.hbase.client.ScannerTimeoutException: 89265ms passed since the last invocation, timeout is currently set to 60000
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:283)
org.trafodion.sql.HBaseAccess.HTableClient.fetchNextRow(HTableClient.java:352)
.

--- 0 row(s) selected.
>>
>>sh date;
Tue Jun  3 11:31:01 PDT 2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)