You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2019/08/30 12:40:00 UTC

[jira] [Created] (IMPALA-8912) Avoid duplicate computeStats on HBaseScanNode

Quanlong Huang created IMPALA-8912:
--------------------------------------

             Summary: Avoid duplicate computeStats on HBaseScanNode
                 Key: IMPALA-8912
                 URL: https://issues.apache.org/jira/browse/IMPALA-8912
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Quanlong Huang


For simple queries on HBase tables that has HBaseScanNode as the root of the SingleNodePlan, HBaseScanNode#computeStats will be called twice.

Stacktrace for the first call:
{code}
        at org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286)
        at org.apache.impala.planner.HBaseScanNode.init(HBaseScanNode.java:160)
        at org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1405)
        at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1582)
        at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:826)
        at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:662)
        at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
        at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
        at org.apache.impala.planner.Planner.createPlan(Planner.java:117)
        at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169)
        at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495)
        at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359)
        at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250)
        at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220)
{code}

Stacktrace for the second call:
{code}
        at org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286)
        at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:307)
        at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
        at org.apache.impala.planner.Planner.createPlan(Planner.java:117)
        at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169)
        at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495)
        at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359)
        at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250)
        at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220)
        at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:154)
{code}

Codes of the second call:
{code:java}
  private PlanNode createQueryPlan(QueryStmt stmt, Analyzer analyzer, boolean disableTopN)
      throws ImpalaException {
    ......
    if (stmt.evaluateOrderBy() && sortHasMaterializedSlots) {
      root = createSortNode(analyzer, root, stmt.getSortInfo(), stmt.getLimit(),
          stmt.getOffset(), stmt.hasLimit(), disableTopN);
    } else {
      root.setLimit(stmt.getLimit());
      root.computeStats(analyzer);   // <--- May call HBaseScanNode#computeStats here
    }

    return root;
  }
{code}

Such kind of queries are usually point queries and are always expected to return fast. HBaseScanNode#computeStats is heavy since it requires RPCs to HBase. We should avoid calling it twice.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org