You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/01/28 01:52:59 UTC
[jira] Created: (HIVE-253) rand() gets precomputated in compilation
phase
rand() gets precomputated in compilation phase
----------------------------------------------
Key: HIVE-253
URL: https://issues.apache.org/jira/browse/HIVE-253
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: Zheng Shao
SELECT * FROM t WHERE rand() < 0.01;
Hive will say: "No need to submit job", because the condition evaluates to false.
The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-253) rand() gets precomputated in
compilation phase
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683123#action_12683123 ]
Namit Jain commented on HIVE-253:
---------------------------------
running tests for committing now
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-253) rand() gets precomputated in
compilation phase
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo reassigned HIVE-253:
----------------------------------
Assignee: Ashish Thusoo
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Ashish Thusoo
> Priority: Blocker
> Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-253:
----------------------------------
Status: Patch Available (was: Open)
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-253:
----------------------------------
Attachment: hive-253.3.patch
The problem was a bug in the patch. In the process of trying to not duplicate code, I introduced a bug where not all UDFs were getting nulled in the pruneExpr. I have now fixed the condition where the nodes are replaced with null.
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo updated HIVE-253:
-------------------------------
Assignee: Raghotham Murthy (was: Ashish Thusoo)
Load balancing...
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Venky Iyer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venky Iyer updated HIVE-253:
----------------------------
Priority: Blocker (was: Major)
marking as blocker as per athusoo's recommendation.
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Zheng Shao
> Priority: Blocker
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-253) rand() gets precomputated in
compilation phase
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682483#action_12682483 ]
Namit Jain commented on HIVE-253:
---------------------------------
Looks good: I have some minor comments:
1. No need to include UDFRand in PartitionPruner
2. test rand_partitionpruner.q:
can you add that on a partitioned table, srcpart, instead ?
can you add a explain plan which will show that all partitions are included ?
3. Can you add another tests for AND/OR of some expression and rand(), like
Again can you perform a explain plan ?
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo updated HIVE-253:
-------------------------------
Affects Version/s: 0.2.0
Fix Version/s: 0.2.0
Marking this for 0.2.0 version.
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Ashish Thusoo
> Priority: Blocker
> Fix For: 0.2.0
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-253) rand() gets precomputated in
compilation phase
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682536#action_12682536 ]
Ashish Thusoo commented on HIVE-253:
------------------------------------
I am getting some failures in the tests...
[junit] Begin query: input_testxpath2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] Exception: Client Execution failed with error code = 10
[junit] junit.framework.AssertionFailedError: Client Execution failed with error code = 10
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath2(TestCliDriver.java:3815)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
[junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
[junit] Hive history file=/data/users/athusoo/commits/hive_trunk_ws6/ql/../build/ql/tmp/hive_job_log_athusoo
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-253:
----------------------------------
Attachment: hive-253.1.patch
Created an annotation called UDFType. If a UDF is not deterministic, dont include it in the partition pruner.
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-253:
----------------------------
Resolution: Fixed
Fix Version/s: (was: 0.2.0)
0.3.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
committed. Thanks Raghu!
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch, hive-253.3.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation
phase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-253:
----------------------------------
Attachment: hive-253.2.patch
Incorporated Namit's comments. Created a couple more test cases.
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-253) rand() gets precomputated in
compilation phase
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682527#action_12682527 ]
Namit Jain commented on HIVE-253:
---------------------------------
+1
looks good
> rand() gets precomputated in compilation phase
> ----------------------------------------------
>
> Key: HIVE-253
> URL: https://issues.apache.org/jira/browse/HIVE-253
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0
> Reporter: Zheng Shao
> Assignee: Raghotham Murthy
> Priority: Blocker
> Fix For: 0.2.0
>
> Attachments: hive-253.1.patch, hive-253.2.patch
>
>
> SELECT * FROM t WHERE rand() < 0.01;
> Hive will say: "No need to submit job", because the condition evaluates to false.
> The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase.
> One way to do that is to add an annotation in the UDFRand and check that in the compiling phase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.