You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2011/08/03 23:45:27 UTC

[jira] [Created] (HIVE-2344) filter is removed due to regression of HIVE-1538

filter is removed due to regression of HIVE-1538
------------------------------------------------

                 Key: HIVE-2344
                 URL: https://issues.apache.org/jira/browse/HIVE-2344
             Project: Hive
          Issue Type: Bug
            Reporter: He Yongqiang
            Assignee: Amareshwari Sriramadasu


 select * from 
 (
 select type_bucket,randum123
 from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
 where randum123 <=0.1)s where s.randum123>0.1 limit 20;

This is returning results...

and 

 explain
 select type_bucket,randum123
 from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
 where randum123 <=0.1

shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2344:
------------------------------------------

    Attachment: hive-patch-2344-2.txt

Here is a patch doing the change. Not pushing the filter on 'udf as column in select' beyond select.

All tests passed with the patch.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-2344:
--------------------------------

    Assignee: Amareshwari Sriramadasu  (was: Ido Hadanny)

Whoops, reassigned to Ido accidentally; reassigning to Amareshwari.


> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081531#comment-13081531 ] 

Amareshwari Sriramadasu commented on HIVE-2344:
-----------------------------------------------

The problem is that the select operator chooses to push down the filter 'random123 <0.1', though it is non deterministic. And the filter is discarded to be pushed down since it is non deterministic later. Will upload a patch with the fix shortly.

Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-2344:
-----------------------------

      Resolution: Fixed
    Release Note: When predicate pushdown is enabled, Hive would previously incorrectly push down predicates on non-deterministic function invocations when those were indirectly referenced via a nested SELECT list rather than directly in the filter expression.  After this change, Hive no longer pushes down filters over indirect references to function invocations of any kind (regardless of determinism).  Note that in Hive, even builtin operators such as + and CAST are treated as function invocations.
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed.  Thanks Amareshwari!


> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081991#comment-13081991 ] 

John Sichi commented on HIVE-2344:
----------------------------------

+1.  Will commit when tests pass.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2344:
------------------------------------------

    Status: Patch Available  (was: Open)

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2344:
------------------------------------------

    Attachment: hive-patch-2344.txt

bq. Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira.

Addressing this also in the patch.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079226#comment-13079226 ] 

Amareshwari Sriramadasu commented on HIVE-2344:
-----------------------------------------------

Looking into.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082469#comment-13082469 ] 

John Sichi commented on HIVE-2344:
----------------------------------

It's possible to avoid the double computation (by pushing the selection down too, similar to column pruning), but I'm fine with skipping that and not pushing the expression beyond the select.

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079131#comment-13079131 ] 

John Sichi commented on HIVE-2344:
----------------------------------

Workaround is

set hive.ppd.remove.duplicatefilters=false


> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2344:
------------------------------------------

        Fix Version/s: 0.8.0
    Affects Version/s: 0.8.0
               Status: Patch Available  (was: Open)

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212986#comment-13212986 ] 

Hudson commented on HIVE-2344:
------------------------------

Integrated in Hive-trunk-h0.21 #1268 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1268/])
    HIVE-2791: filter is still removed due to regression of HIVE-1538 althougth HIVE-2344 (binlijin via hashutosh) (Revision 1291916)

     Result = SUCCESS
hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291916
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/ppd2.q
* /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out

                
> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-2344:
-----------------------------

    Assignee: Ido Hadanny  (was: Amareshwari Sriramadasu)
      Status: Open  (was: Patch Available)

I'm getting many regression test failures due to EXPLAIN plan changes.


> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Ido Hadanny
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081567#comment-13081567 ] 

jiraposter@reviews.apache.org commented on HIVE-2344:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1404/
-----------------------------------------------------------

Review request for hive, John Sichi and Yongqiang He.


Summary
-------

Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not. Patch addresses this by walking through the udf expression again.


This addresses bug HIVE-2344.
    https://issues.apache.org/jira/browse/HIVE-2344


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1153812 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153812 
  trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1404/diff


Testing
-------


Thanks,

Amareshwari



> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082293#comment-13082293 ] 

Amareshwari Sriramadasu commented on HIVE-2344:
-----------------------------------------------

Sorry John. Earlier patch has a bug. 

bq. Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this?
More on this: Here, currently the filter (along with udf) is pushed till TableScan. So essentially, we would apply the udf twice for the qualified rows. And it is expensive, if udf is expensive. So, I propose we should not push it beyond the select. Thoughts?

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2344:
------------------------------------------

    Attachment: ppd_udf_col.q.out.txt

bq. Any other filter on 'udf selected as column alias in select' will also be pushed down always.

Attaching test output with faulty explain plans. 

> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>         Attachments: ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2344) filter is removed due to regression of HIVE-1538

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082481#comment-13082481 ] 

jiraposter@reviews.apache.org commented on HIVE-2344:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1404/
-----------------------------------------------------------

(Updated 2011-08-10 17:06:46.444966)


Review request for hive, John Sichi and Yongqiang He.


Changes
-------

The filter on 'udf selected as column alias in select' is no more pushed beyond the select.


Summary (updated)
-------

Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not. 


This addresses bug HIVE-2344.
    https://issues.apache.org/jira/browse/HIVE-2344


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1156069 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1156069 
  trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1404/diff


Testing (updated)
-------

All tests pass with the patch.


Thanks,

Amareshwari



> filter is removed due to regression of HIVE-1538
> ------------------------------------------------
>
>                 Key: HIVE-2344
>                 URL: https://issues.apache.org/jira/browse/HIVE-2344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: He Yongqiang
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.8.0
>
>         Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt
>
>
>  select * from 
>  (
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1)s where s.randum123>0.1 limit 20;
> This is returning results...
> and 
>  explain
>  select type_bucket,randum123
>  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
>  where randum123 <=0.1
> shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira