You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mithun Radhakrishnan (JIRA)" <ji...@apache.org> on 2014/09/30 22:58:34 UTC

[jira] [Commented] (HIVE-8313) Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator

    [ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153741#comment-14153741 ] 

Mithun Radhakrishnan commented on HIVE-8313:
--------------------------------------------

This seems to have to do with the changes introduced in HIVE-4209, to provide caching for evaluation of deterministic sub-expressions.

In this particular case, the problem occurs in {{ExprNodeGenericFuncEvaluator::_evaluate()}}:

{code:title=ExprNodeGenericFuncEvaluator.java|borderStyle=solid}
  @Override
  protected Object _evaluate(Object row, int version) throws HiveException {
    rowObject = row;
    if (ObjectInspectorUtils.isConstantObjectInspector(outputOI) &&
        isDeterministic()) {
      // The output of this UDF is constant, so don't even bother evaluating.
      return ((ConstantObjectInspector)outputOI).getWritableConstantValue();
    }
    for (int i = 0; i < deferredChildren.length; i++) {
      deferredChildren[i].prepare(version);
    }
    return genericUDF.evaluate(deferredChildren);
  }
{code}

In Hive 0.10, the {{deferredChildren[i].evaluate()}} would be skipped in its entirety, for "non-eager" evaluation. In Hive 0.12, that condition is checked within the {{prepare()}} function, on every invocation, for *each record*, with explosive effect.

A lot of this cost can be saved by skipping prepare() for {{ExprNodeEvaluator}}s which yield the same value regardless of the row. E.g. {{ExprNodeConstantEvaluator}} and {{ExprNodeNullEvaluator}}. I'll post a patch for this shortly.

> Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-8313
>                 URL: https://issues.apache.org/jira/browse/HIVE-8313
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.12.0, 0.13.0, 0.14.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>
> Consider the following query:
> {code}
> SELECT foo, bar, goo, id
> FROM myTable
> WHERE id IN { 'A', 'B', 'C', 'D', ... , 'ZZZZZZ' };
> {code}
> One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10.
> I have a possibly incomplete fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)