You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/04/14 20:36:49 UTC

[jira] Created: (HIVE-1307) More generic and efficient merge method

More generic and efficient merge method
---------------------------------------

                 Key: HIVE-1307
                 URL: https://issues.apache.org/jira/browse/HIVE-1307
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.6.0
            Reporter: Ning Zhang
            Assignee: Ning Zhang
             Fix For: 0.6.0


Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 

We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1307:
---------------------------------

        Fix Version/s: 0.6.0
                           (was: 0.7.0)
    Affects Version/s:     (was: 0.6.0)
          Component/s: Query Processor

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_2_branch_0.6.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902315#action_12902315 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

Yes, the merge[2-4].q.out files are the only difference in the 2nd patch.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_2_branch_0.6.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307_branch_0.6.patch

Uploading a patch for branch 0.6.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1307:
-----------------------------

    Status: Open  (was: Patch Available)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866655#action_12866655 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

Some design notes:

This task should benefit not only the dynamic partition inserts, but any inserts that requires merging (hive.merge.mapfiles/mapredfiles=true). The idea is as follows:

The current merge job is a MapReduce job for each partition. The mappers are just reading the files and pass alone to only 1 reducer. The reducer is responsible to consolidate all inputs into a single stream. The extra work in the boundary of mapper/reducer (e.g., copying, shuffling and sorting) are not necessary. 

With the CombineHiveInputFormat, the merge job is map-only and it should take care of multiple partitions. The idea is that one mapper should be generated for each partition. The input format for that mapper should be CombineHiveInputFormat so that it will read multiple files and output to one file.  

Since CombineHiveInputFormat depends on a Hadoop 0.20 feature, this feature relies on shim to tell whether to use the new merge job (M) or old one (MR). With this restriction, merging after dynamic partition insert only works for Hadoop 0.20. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.5.patch

Uploading HIVE-1307.5.patch which should solves the 0.17 issue. I'm runing 0.17 test now.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900786#action_12900786 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

will start testing and reviewing again

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900425#action_12900425 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Also, dont you need the to pass the hadoopVersion in all the tests, and not just CliDriver

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.4.patch

Uploading HIVE-1307.4.patch, which 
 - updated to the latest rev.
 - refactored GenMRFileSinkOperator
 - make all tests shim-aware
I'm testing on hadoop 0.20 now.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900866#action_12900866 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

TestParse is failing on both 17 and 20.

On 17, the following tests are failing in 17:

bucketmapjoin1.q
bucketmapjoin2.q
bucketmapjoin3.q


All of them are log file updates - can you fix the log files and submit a new patch ?

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900926#action_12900926 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

Ok. I thought only these 3 .q files are failing on 0.17. I'm rerunning TestParse.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894601#action_12894601 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

I'm about to upload a new patch after more testing on real queries on real clusters. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900812#action_12900812 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

The patch applied cleanly

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900884#action_12900884 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

Will regenerate the patch.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Status: Patch Available  (was: Open)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1307:
-----------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Finally --

Committed. Thanks Ning

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Status: Patch Available  (was: Open)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.2.patch

Uploading a new full patch HIVE-1307.2.patch, containing the following additional changes:
 - more log file changes due to svn up to the latest revision (mostly due to conflict with another patch on lineage hooks).
 - minor change in FileUtils.java to include '{' and ']' as special characters to escape when they are used as partition column values.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.6.patch

Uploading HIVE-1307.6.patch which applies cleanly with the current trunk.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900421#action_12900421 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

HIVE-1307.3.patch does not apply cleanly.

Can you regenerate it ?

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.9.patch

sigh, hopefully this is the last patch. I'm finishing some conflict in bucketmapjoin[1-3].q.out in 0.17. will run 0.17 again.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906 ] 

Namit Jain edited comment on HIVE-1307 at 8/20/10 7:04 PM:
-----------------------------------------------------------

ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running OK for you ?

      was (Author: namit):
    ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running 
  
> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900454#action_12900454 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Dont we need to do the comparison in checkPlan also ?

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Status: In Progress  (was: Patch Available)

There are additional log changes and a minor code change after hadoop 0.20 tests. I'll upload a new patch once 0.17 finishs. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900788#action_12900788 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

The patch does not apply cleanly - can you regenerate

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902309#action_12902309 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Ning, for hadoop 0.20
merge2.q,merge3.q,merge4.q are failing

Can you upload the new patch ?

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.8.patch

Uploading HIVE-1307.8.patch which clean up the TestParse in 0.17.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902313#action_12902313 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Is the only difference the new log files ?

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_2_branch_0.6.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Status: Patch Available  (was: In Progress)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.7.patch

Uploading HIVE-1307.7.patch. The only differences from the last on is the log change in input[1-3].q.xml in 0.17 and input[2-3].q.xml in 0.20.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900817#action_12900817 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

all tests on 0.17 and 0.20 passed. There is an intermittent diff in index_compact_2.q on 0.20 in parallel test. When I run it individually it succeeded. Not sure if it is due to parallel testing. Will run 0.20 sequentially again. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900473#action_12900473 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

Will address the code refactoring and update patch. 

regarding to hadoopVersion, the patch does have changes to add it to all tests (in ql/build.xml). checkPlan is changed to compare different log files according to hadoopVersion.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1307:
-----------------------------

    Status: Open  (was: Patch Available)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900773#action_12900773 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

OK, 0.17 tests passed. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900644#action_12900644 ] 

Ning Zhang commented on HIVE-1307:
----------------------------------

It's weired. 0.20 passed, but 0.17 failed mysteriously. Investigating.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.0.patch

Uploading a preliminary patch. This is not ready for review yet. 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902504#action_12902504 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Committed in 0.6 - Thanks Ning

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_2_branch_0.6.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900418#action_12900418 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

Still looking - few comments

1. Extra debug info in MapOperator
2. createMap4Merge and createMapReduce4Merge have a lot of common code at the end. Can you combine it in a function ?


> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1307:
---------------------------------

    Fix Version/s: 0.7.0
                       (was: 0.6.0)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906 ] 

Namit Jain commented on HIVE-1307:
----------------------------------

ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running 

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Status: Patch Available  (was: Open)

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.3.patch
                HIVE-1307.3_java.patch

Uploading HIVE-1307.3.patch and HIVE-1307.3_java.patch (java changes only). This patch fixes a bug in dynamic partition insert (adding partition column property in GenMRFileSink1.java). Also added one unit test case merge4.q for this case.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307_2_branch_0.6.patch

Uploading HIVE-1307_2_branch_0.6.patch which includes merge[2-4].q.out for hadoop 0.20.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, HIVE-1307.patch, HIVE-1307_2_branch_0.6.patch, HIVE-1307_branch_0.6.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1307) More generic and efficient merge method

Posted by "Venkatesh S (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894519#action_12894519 ] 

Venkatesh S commented on HIVE-1307:
-----------------------------------

Hey Ning, any update on this issue is greatly appreciated.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1307:
-----------------------------

    Attachment: HIVE-1307.patch
                HIVE-1307_java_only.patch

Uploading the full patch (lots of log changes) HIVE-1307.patch and the patch containing the code changes only (HIVE-1307_java_only.patch).

I'm running full tests again now.

> More generic and efficient merge method
> ---------------------------------------
>
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1307.0.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read the input files and output to one reducer for merging. This MR job is created at compile time and one MR job for one partition. In the case of dynamic partition case, multiple partitions could be created at execution time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and most of the time a map-only job should be sufficient if we use CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.