You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "David Wayne Birdsall (JIRA)" <ji...@apache.org> on 2017/01/19 18:23:26 UTC

[jira] [Created] (TRAFODION-2455) Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

David Wayne Birdsall created TRAFODION-2455:
-----------------------------------------------

             Summary: Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)
                 Key: TRAFODION-2455
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2455
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.1-incubating
         Environment: A cluster large enough to host a 22 billion row table
            Reporter: David Wayne Birdsall
            Assignee: David Wayne Birdsall


When loading a scale factor 73728 Order Entry database, if UPDATE STATISTICS is done soon after the load on one particular table (the largest table, having 22 billion rows), we get the following failure:

SQLEXCEPTION on Statement, Error Code = -9200
   update statistics for table trafodion.javabench.oe_orderline_73728 on every column, (OL_W_ID, OL_I_ID), (OL_D_ID, OL_W_ID), (OL_D_ID, OL_I_ID) sample
*** ERROR[9200] UPDATE STATISTICS for table TRAFODION.JAVABENCH.OE_ORDERLINE_73728 encountered an error (8448) from statement getRow(). [2017-01-09 02:07:22]
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::coProcAggr returned error HBASE_ACCESS_ERROR(-706). Cause: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=3, exceptions:
Mon Jan 09 01:47:21 PST 2017, RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=73, waitTime=600001, operationTimeout=600000 expired.
Mon Jan 09 01:57:21 PST 2017, RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=185, waitTime=600001, operationTimeout=600000 expired.
Mon Jan 09 02:07:22 PST 2017, RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=310, waitTime=600001, operationTimeout=600000 expired.

A subsequent update statistics command succeeds, but these failures take a half hour or more.

Enabling logging for update stats shows that getrowcount returns 0, so update stats assumes the table is small enough to do a select count (*). The plan for this select count (*) (perhaps suffering from the same issue that causes getrowcount to return a non-estimate) chooses the HBase aggregate coprocessor. The table in question has 22 billion rows, so the the coprocessor isn't a good choice, and the query times out. But the real issue is, why can't the table get a rowcount estimate.

Rerunning UPDATE STATS on this table a few hours later succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)