You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/01/11 20:29:54 UTC

[jira] Created: (HIVE-1039) multi-insert doesn't work for local directories

multi-insert doesn't work for local directories
-----------------------------------------------

                 Key: HIVE-1039
                 URL: https://issues.apache.org/jira/browse/HIVE-1039
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Ning Zhang
            Assignee: Ning Zhang


As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 

hive> from test
    > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
where a = 1
    > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
where a = 3;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799023#action_12799023 ] 

Namit Jain commented on HIVE-1039:
----------------------------------

looks ok - but can you add the following tests:

1. hive.merge.maptasks = true
multi table insert query (map only)

2. hive.merge.maptasks = false
multi table insert query (map only)


3. hive.merge.maptasks = true
multi directory insert query (map only)

4. hive.merge.maptasks = false
multi directory insert query (map only)

5. hive.merge.mapredtasks = true
multi table insert query (map-reduce)

6. hive.merge.mapredtasks = false
multi table insert query (map-reduce)



> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1039:
-----------------------------

    Attachment: HIVE-1039_2.patch

Uploading new patch includes unit tests mentioned above. 

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1039:
-----------------------------

    Attachment: HIVE-1039.patch

Uploading patch HIVE-1039.patch. This fixes the issue where only 1 moveTask was added when multipole local directories appear in a multi-insert.

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1039:
-----------------------------

    Attachment: HIVE-1039_3.patch

Discussed with Namit offline, and found a larger issue in multi-insert for table as well if hive.merge.mapfiles=false. The problem is that each TableScanOperator will create a new task as currTask. Any operator that takes multiple parent topOp should merge the tasks into one. Currently GenMRUnion1 does not merge currTask, that's why cause this problem. We decided to fix this issue in GenMRFileSink1() as it is now in this patch for 0.5.0 and I will file another JIRA for trunk to merge tasks in GenMRUnion1. 

The HIVE-1039_3.patch contains the fix for the broader problem mentioned above, and a minor fix for checking hive.merge.mapredfiles together with existance of reducer. Also more tests are added to cover multi-insert involving UNION and all combinations of the two hive.merge paramters. 

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0, 0.6.0
>
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch, HIVE-1039_3.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1039:
---------------------------------

      Component/s: Query Processor
    Fix Version/s:     (was: 0.6.0)

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0
>
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch, HIVE-1039_3.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1039:
-----------------------------

    Status: Patch Available  (was: Open)

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1039:
-----------------------------

    Fix Version/s: 0.6.0
                   0.5.0

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0, 0.6.0
>
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799429#action_12799429 ] 

Namit Jain commented on HIVE-1039:
----------------------------------

+1

looks good - will commit to both 0.5 and trunk if the tests pass

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0, 0.6.0
>
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch, HIVE-1039_3.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1039:
-----------------------------

    Affects Version/s: 0.6.0
                       0.5.0

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1039:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.5.0, 0.6.0
>
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch, HIVE-1039_3.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799298#action_12799298 ] 

Ning Zhang commented on HIVE-1039:
----------------------------------

Namit, do you mean something like this?

from (select * from src  union all select * from src) s
insert overwrite table src_multi1 select * where key < 10
insert overwrite table src_multi2 select * where key > 10 and key < 20;



> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799336#action_12799336 ] 

Namit Jain commented on HIVE-1039:
----------------------------------

This is needed for 0.5 also

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1039) multi-insert doesn't work for local directories

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799256#action_12799256 ] 

Namit Jain commented on HIVE-1039:
----------------------------------

@Ning, can you test union for same table (map only) with a multi table insert also ?

> multi-insert doesn't work for local directories
> -----------------------------------------------
>
>                 Key: HIVE-1039
>                 URL: https://issues.apache.org/jira/browse/HIVE-1039
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1039.patch, HIVE-1039_2.patch
>
>
> As wd pointed out in hive-user, the following query only load data to the first local directory. Multi-insert to tables works fine. 
> hive> from test
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/0' select *
> where a = 1
>     > INSERT OVERWRITE LOCAL DIRECTORY '/home/stefdong/tmp/1' select *
> where a = 3;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.