You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/02/26 07:25:01 UTC
[jira] Created: (HIVE-308) UNION ALL should create different
destination directories for different operands
UNION ALL should create different destination directories for different operands
--------------------------------------------------------------------------------
Key: HIVE-308
URL: https://issues.apache.org/jira/browse/HIVE-308
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.2.0, 0.3.0
Reporter: Zheng Shao
The following query hangs:
{code}
select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
{code}
The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
{code}
select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
{code}
The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-308:
----------------------------
Affects Version/s: (was: 0.2.0)
Status: Patch Available (was: Open)
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680146#action_12680146 ]
Johan Oskarsson commented on HIVE-308:
--------------------------------------
This patch seems to haven broken the nightly build, see test TestCliDriver.testCliDriver_union3.
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/26/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_union3/
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679992#action_12679992 ]
Namit Jain commented on HIVE-308:
---------------------------------
I saw your other mail just now - if you are in a hurry, go ahead.
The changes look good
+1
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-308:
----------------------------
Resolution: Fixed
Fix Version/s: 0.3.0
Assignee: Zheng Shao
Release Note: HIVE-308. UNION ALL: FileSinkOperator now adds files in case the target exists. (zshao)
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed revision 751583.
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679990#action_12679990 ]
Namit Jain commented on HIVE-308:
---------------------------------
Zheng, there are a lot of problems with union, and I am in the process of fixing them in:
https://issues.apache.org/jira/browse/HIVE-318
Some corner cases are not working, and I should be done hopefully in a day or 2.
Can you hold on to this patch - let us look at these 2 patches together and then decide
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-308:
----------------------------
Priority: Blocker (was: Major)
Raising to critical since this may produce wrong query results.
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0, 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677499#action_12677499 ]
Zheng Shao commented on HIVE-308:
---------------------------------
The problem of the first query (map-only job) is that we have 2 file sink operators.
See the log:
2009-02-27 11:55:42,528 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2009-02-27 11:55:42,595 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2009-02-27 11:55:42,612 INFO org.apache.hadoop.mapred.MapTask: split: hdfs://xxxx:9000/warehouse/zshao_lazy/8413_m_000000_0.gz, range: 0-28
2009-02-27 11:55:42,631 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2009-02-27 11:55:42,632 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2009-02-27 11:55:42,789 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self
2009-02-27 11:55:42,793 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias null-subquery2:a-subquery2:zshao_lazy to work list for file /warehouse/zshao_lazy/8413_m_000000_0.gz
2009-02-27 11:55:42,793 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias null-subquery1:a-subquery1:zshao_lazy to work list for file /warehouse/zshao_lazy/8413_m_000000_0.gz
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Got partitions: null
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing children:
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self
2009-02-27 11:55:42,804 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: /tmp/hive-zshao/_tmp.92566742.10001.insclause-0/_tmp.10549_m_000000_0
2009-02-27 11:55:42,813 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,818 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self
2009-02-27 11:55:42,819 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: /tmp/hive-zshao/_tmp.92566742.10001.insclause-0/_tmp.10549_m_000000_0
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0, 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677500#action_12677500 ]
Zheng Shao commented on HIVE-308:
---------------------------------
Explain extended also shows 2 file sink operator. But it's possible that there is only 2 and get displayed TWICE (and also initialized TWICE during query execution)
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.2.0, 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-308) UNION ALL should create different
destination directories for different operands
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-308:
----------------------------
Attachment: HIVE-308.1.patch
Fixing the bug (the second case) and added a test case.
> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
> Key: HIVE-308
> URL: https://issues.apache.org/jira/browse/HIVE-308
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.3.0
> Reporter: Zheng Shao
> Priority: Blocker
> Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code}
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code}
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code}
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code}
> The reason of both is that the destination directory of the file sink operator conflicts with each other.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.