You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mithun Radhakrishnan (JIRA)" <ji...@apache.org> on 2014/09/30 22:58:34 UTC
[jira] [Commented] (HIVE-8313) Optimize evaluation for
ExprNodeConstantEvaluator and ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153741#comment-14153741 ]
Mithun Radhakrishnan commented on HIVE-8313:
--------------------------------------------
This seems to have to do with the changes introduced in HIVE-4209, to provide caching for evaluation of deterministic sub-expressions.
In this particular case, the problem occurs in {{ExprNodeGenericFuncEvaluator::_evaluate()}}:
{code:title=ExprNodeGenericFuncEvaluator.java|borderStyle=solid}
@Override
protected Object _evaluate(Object row, int version) throws HiveException {
rowObject = row;
if (ObjectInspectorUtils.isConstantObjectInspector(outputOI) &&
isDeterministic()) {
// The output of this UDF is constant, so don't even bother evaluating.
return ((ConstantObjectInspector)outputOI).getWritableConstantValue();
}
for (int i = 0; i < deferredChildren.length; i++) {
deferredChildren[i].prepare(version);
}
return genericUDF.evaluate(deferredChildren);
}
{code}
In Hive 0.10, the {{deferredChildren[i].evaluate()}} would be skipped in its entirety, for "non-eager" evaluation. In Hive 0.12, that condition is checked within the {{prepare()}} function, on every invocation, for *each record*, with explosive effect.
A lot of this cost can be saved by skipping prepare() for {{ExprNodeEvaluator}}s which yield the same value regardless of the row. E.g. {{ExprNodeConstantEvaluator}} and {{ExprNodeNullEvaluator}}. I'll post a patch for this shortly.
> Optimize evaluation for ExprNodeConstantEvaluator and ExprNodeNullEvaluator
> ---------------------------------------------------------------------------
>
> Key: HIVE-8313
> URL: https://issues.apache.org/jira/browse/HIVE-8313
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.12.0, 0.13.0, 0.14.0
> Reporter: Mithun Radhakrishnan
> Assignee: Mithun Radhakrishnan
>
> Consider the following query:
> {code}
> SELECT foo, bar, goo, id
> FROM myTable
> WHERE id IN { 'A', 'B', 'C', 'D', ... , 'ZZZZZZ' };
> {code}
> One finds that when the IN clause has several thousand elements (and the table has several million rows), the query above takes orders-of-magnitude longer to run on Hive 0.12 than say Hive 0.10.
> I have a possibly incomplete fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)