You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2008/11/18 23:23:44 UTC

[jira] Created: (HIVE-72) wrong results if partition pruning not script and no mep-reduce job needed

wrong results if partition pruning not script and no mep-reduce job needed
--------------------------------------------------------------------------

                 Key: HIVE-72
                 URL: https://issues.apache.org/jira/browse/HIVE-72
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Namit Jain
            Assignee: Namit Jain


Suppose T is a partitioned table on ds, where ds is a string column, the following queries:

 SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
 SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;


return the first row from the first partition.



This is because of the typecast to double.

for a.ds=2008-01-01 or anything (a.ds=1),

 evaluate (Double, Double) is invoked at partition pruning.

Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.

filter is not invoked since it is a select * query, so map-reduce job is started.
We just turn off this optimization if pruning indicates that there can be unknown partitions. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648794#action_12648794 ] 

Ashish Thusoo commented on HIVE-72:
-----------------------------------

I think the correct way for this is to return something from the prune call to indicate that there were some unknown partitions.

Inline Comments
ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionPruner.java:278	incomplete javadocs.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:2938	What happens in case the parts are actually 0 i.e. no parts match the criteria (they are not unknown but they all return false). We would in that case not be making this optimization. The test case for that would be select * from srcpart where srcpart.ds = '2000-01-01' in our tests. We clearly do not want to turn off the optimization when this happens. right? 

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648838#action_12648838 ] 

Namit Jain commented on HIVE-72:
--------------------------------

changed prune() to return the two lists: confirmed and unknown separately and let the client handle it

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-72:
---------------------------

    Summary: wrong results if partition pruning not strict and no mep-reduce job needed  (was: wrong results if partition pruning not script and no mep-reduce job needed)

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-72:
---------------------------

    Attachment: patch1.mapred.txt

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1.mapred.txt
>
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-72:
---------------------------

    Status: Patch Available  (was: Open)

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1.mapred.txt
>
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649502#action_12649502 ] 

Ashish Thusoo commented on HIVE-72:
-----------------------------------

+1

looks good to me.


> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1.mapred.txt
>
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-72:
-------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)
      Component/s: Query Processor

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.3.0
>
>         Attachments: patch1.mapred.txt
>
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HIVE-72:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks Namit!

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.20.0
>
>         Attachments: patch1.mapred.txt
>
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.