You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/01/28 01:52:59 UTC

[jira] Created: (HIVE-253) rand() gets precomputated in compilation phase

rand() gets precomputated in compilation phase
----------------------------------------------

                 Key: HIVE-253
                 URL: https://issues.apache.org/jira/browse/HIVE-253
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Zheng Shao


SELECT * FROM t WHERE rand() < 0.01;

Hive will say: "No need to submit job", because the condition evaluates to false.

The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.

One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683123#action_12683123 ] 

Namit Jain commented on HIVE-253:
---------------------------------

running tests for committing now

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo reassigned HIVE-253:
----------------------------------

    Assignee: Ashish Thusoo

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-253:
----------------------------------

    Status: Patch Available  (was: Open)

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-253:
----------------------------------

    Attachment: hive-253.3.patch

The problem was a bug in the patch. In the process of trying to not duplicate code, I introduced a bug where not all UDFs were getting nulled in the pruneExpr. I have now fixed the condition where the nodes are replaced with null.

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-253:
-------------------------------

    Assignee: Raghotham Murthy  (was: Ashish Thusoo)

Load balancing...

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Venky Iyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venky Iyer updated HIVE-253:
----------------------------

    Priority: Blocker  (was: Major)

marking as blocker as per athusoo's recommendation. 

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Zheng Shao
>            Priority: Blocker
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682483#action_12682483 ] 

Namit Jain commented on HIVE-253:
---------------------------------

Looks good: I have some minor comments:

1. No need to include UDFRand in PartitionPruner
2. test rand_partitionpruner.q:
     can you add that on a partitioned table, srcpart, instead ?
     can you add a explain plan which will show that all partitions are included ? 
3. Can you add another tests for AND/OR of some expression and rand(), like
    Again can you perform a explain plan ?

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-253:
-------------------------------

    Affects Version/s: 0.2.0
        Fix Version/s: 0.2.0

Marking this for 0.2.0 version.

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682536#action_12682536 ] 

Ashish Thusoo commented on HIVE-253:
------------------------------------

I am getting some failures in the tests...

    [junit] Begin query: input_testxpath2.q
    [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
    [junit] OK
    [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
    [junit] OK
    [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
    [junit] OK
    [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
    [junit] OK
    [junit] Loading data to table srcbucket
    [junit] OK
    [junit] Loading data to table srcbucket
    [junit] OK
    [junit] Loading data to table src
    [junit] OK
    [junit] Exception: Client Execution failed with error code = 10
    [junit] junit.framework.AssertionFailedError: Client Execution failed with error code = 10
    [junit]     at junit.framework.Assert.fail(Assert.java:47)
    [junit]     at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath2(TestCliDriver.java:3815)
    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
    [junit]     at junit.framework.TestCase.runTest(TestCase.java:154)
    [junit]     at junit.framework.TestCase.runBare(TestCase.java:127)
    [junit]     at junit.framework.TestResult$1.protect(TestResult.java:106)
    [junit]     at junit.framework.TestResult.runProtected(TestResult.java:124)
    [junit]     at junit.framework.TestResult.run(TestResult.java:109)
    [junit]     at junit.framework.TestCase.run(TestCase.java:118)
    [junit]     at junit.framework.TestSuite.runTest(TestSuite.java:208)
    [junit]     at junit.framework.TestSuite.run(TestSuite.java:203)
    [junit]     at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
    [junit]     at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
    [junit]     at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
    [junit] Hive history file=/data/users/athusoo/commits/hive_trunk_ws6/ql/../build/ql/tmp/hive_job_log_athusoo

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-253:
----------------------------------

    Attachment: hive-253.1.patch

Created an annotation called UDFType. If a UDF is not deterministic, dont include it in the partition pruner.

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-253:
----------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.2.0)
                   0.3.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

committed. Thanks Raghu!

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-253:
----------------------------------

    Attachment: hive-253.2.patch

Incorporated Namit's comments. Created a couple more test cases.

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-253) rand() gets precomputated in compilation phase

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682527#action_12682527 ] 

Namit Jain commented on HIVE-253:
---------------------------------

+1

looks good

> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
>                 Key: HIVE-253
>                 URL: https://issues.apache.org/jira/browse/HIVE-253
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Raghotham Murthy
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.