You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/11/19 00:07:44 UTC

[jira] Commented: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

    [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648794#action_12648794 ] 

Ashish Thusoo commented on HIVE-72:
-----------------------------------

I think the correct way for this is to return something from the prune call to indicate that there were some unknown partitions.

Inline Comments
ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionPruner.java:278	incomplete javadocs.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:2938	What happens in case the parts are actually 0 i.e. no parts match the criteria (they are not unknown but they all return false). We would in that case not be making this optimization. The test case for that would be select * from srcpart where srcpart.ds = '2000-01-01' in our tests. We clearly do not want to turn off the optimization when this happens. right? 

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.