You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/02/26 07:25:01 UTC

[jira] Created: (HIVE-308) UNION ALL should create different destination directories for different operands

UNION ALL should create different destination directories for different operands
--------------------------------------------------------------------------------

                 Key: HIVE-308
                 URL: https://issues.apache.org/jira/browse/HIVE-308
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.2.0, 0.3.0
            Reporter: Zheng Shao


The following query hangs:
{code} 
select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
{code} 

The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
{code} 
select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
{code} 

The reason of both is that the destination directory of the file sink operator conflicts with each other.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-308:
----------------------------

    Affects Version/s:     (was: 0.2.0)
               Status: Patch Available  (was: Open)

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680146#action_12680146 ] 

Johan Oskarsson commented on HIVE-308:
--------------------------------------

This patch seems to haven broken the nightly build, see test TestCliDriver.testCliDriver_union3.
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/26/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_union3/

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679992#action_12679992 ] 

Namit Jain commented on HIVE-308:
---------------------------------

I saw your other mail just now - if you are in a hurry, go ahead.
The changes look good


+1


> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-308:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.3.0
         Assignee: Zheng Shao
     Release Note: HIVE-308. UNION ALL: FileSinkOperator now adds files in case the target exists. (zshao)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed revision 751583.

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679990#action_12679990 ] 

Namit Jain commented on HIVE-308:
---------------------------------

Zheng, there are a lot of problems with union, and I am in the process of fixing them in:

https://issues.apache.org/jira/browse/HIVE-318

Some corner cases are not working, and I should be done hopefully in a day or 2.
Can you hold on to this patch - let us look at these 2 patches together and then decide

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-308:
----------------------------

    Priority: Blocker  (was: Major)

Raising to critical since this may produce wrong query results.

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0, 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677499#action_12677499 ] 

Zheng Shao commented on HIVE-308:
---------------------------------

The problem of the first query (map-only job) is that we have 2 file sink operators.
See the log:

2009-02-27 11:55:42,528 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2009-02-27 11:55:42,595 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2009-02-27 11:55:42,612 INFO org.apache.hadoop.mapred.MapTask: split: hdfs://xxxx:9000/warehouse/zshao_lazy/8413_m_000000_0.gz, range: 0-28
2009-02-27 11:55:42,631 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2009-02-27 11:55:42,632 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2009-02-27 11:55:42,789 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self
2009-02-27 11:55:42,793 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias null-subquery2:a-subquery2:zshao_lazy to work list for file /warehouse/zshao_lazy/8413_m_000000_0.gz
2009-02-27 11:55:42,793 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias null-subquery1:a-subquery1:zshao_lazy to work list for file /warehouse/zshao_lazy/8413_m_000000_0.gz
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Got partitions: null
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing Self
2009-02-27 11:55:42,802 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing children:
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,803 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self
2009-02-27 11:55:42,804 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: /tmp/hive-zshao/_tmp.92566742.10001.insclause-0/_tmp.10549_m_000000_0
2009-02-27 11:55:42,813 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initialization Done
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.ForwardOperator: Initializing children:
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self
2009-02-27 11:55:42,817 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children:
2009-02-27 11:55:42,818 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self
2009-02-27 11:55:42,819 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: /tmp/hive-zshao/_tmp.92566742.10001.insclause-0/_tmp.10549_m_000000_0



> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0, 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677500#action_12677500 ] 

Zheng Shao commented on HIVE-308:
---------------------------------

Explain extended also shows 2 file sink operator. But it's possible that there is only 2 and get displayed TWICE (and also initialized TWICE during query execution)


> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0, 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-308) UNION ALL should create different destination directories for different operands

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-308:
----------------------------

    Attachment: HIVE-308.1.patch

Fixing the bug (the second case) and added a test case.

> UNION ALL should create different destination directories for different operands
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-308
>                 URL: https://issues.apache.org/jira/browse/HIVE-308
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.0
>            Reporter: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-308.1.patch
>
>
> The following query hangs:
> {code} 
> select * from (select 1 from zshao_lazy union all select 2 from zshao_lazy) a;
> {code} 
> The following query produce wrong results: (one map-reduce job overwrite/cannot overwrite the result of the other)
> {code} 
> select * from (select 1 as id from zshao_lazy cluster by id union all select 2 as id from zshao_meta) a;
> {code} 
> The reason of both is that the destination directory of the file sink operator conflicts with each other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.