You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/08/11 04:11:14 UTC

[jira] Created: (HIVE-746) constant folding

constant folding
----------------

                 Key: HIVE-746
                 URL: https://issues.apache.org/jira/browse/HIVE-746
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain


The constants are not folded at compile time:

for eg:

select 1+2 from src

will evaluate 1+2 for every row.

This becomes more interesting for scenarios like:

select unix_timestamp() from src;

The UDF should be evaluated only once, and the same value should be returned. However, currently, we mark it as non-deterministic and evaluate it for every row.
This can have bad side-effects on partition pruning etc.

In MySQL, the same value is generated independent of the time taken for the query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-746) constant folding

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743517#action_12743517 ] 

Zheng Shao commented on HIVE-746:
---------------------------------

We can do this in the optimization phase (before column pruning).
This will be done together with HIVE-757.

We do a visit from the root operators. We visit one operator only if all its parents are visited.

For each expression tree in an operator, we will try to see if we can pre-compute part or all of the expression tree, by doing a bottom-up calculation.
A leaf node is a constant, if it's a constant node, or it's referencing a column from its parent that is constant. We directly fill in the constant value if it's the latter case.
A non-leaf node is a constant, if all of its children are constant (ok if no children at all), and the node is deterministic (all except non-deterministic udf/genericudf).
We fold all non-leaf node into a single constant node.

After constant folding is done, column pruning should be able to prune out those constant columns in the intermediate operators.


> constant folding
> ----------------
>
>                 Key: HIVE-746
>                 URL: https://issues.apache.org/jira/browse/HIVE-746
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>
> The constants are not folded at compile time:
> for eg:
> select 1+2 from src
> will evaluate 1+2 for every row.
> This becomes more interesting for scenarios like:
> select unix_timestamp() from src;
> The UDF should be evaluated only once, and the same value should be returned. However, currently, we mark it as non-deterministic and evaluate it for every row.
> This can have bad side-effects on partition pruning etc.
> In MySQL, the same value is generated independent of the time taken for the query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-746) constant folding

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-746:
-------------------------------

    Assignee: Zheng Shao

> constant folding
> ----------------
>
>                 Key: HIVE-746
>                 URL: https://issues.apache.org/jira/browse/HIVE-746
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Zheng Shao
>
> The constants are not folded at compile time:
> for eg:
> select 1+2 from src
> will evaluate 1+2 for every row.
> This becomes more interesting for scenarios like:
> select unix_timestamp() from src;
> The UDF should be evaluated only once, and the same value should be returned. However, currently, we mark it as non-deterministic and evaluate it for every row.
> This can have bad side-effects on partition pruning etc.
> In MySQL, the same value is generated independent of the time taken for the query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.