You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Kevin Wilfong (JIRA)" <ji...@apache.org> on 2012/06/16 03:34:42 UTC

[jira] [Created] (HIVE-3149) Dynamically generated paritions deleted by BlockMergeTask

Kevin Wilfong created HIVE-3149:
-----------------------------------

             Summary: Dynamically generated paritions deleted by BlockMergeTask
                 Key: HIVE-3149
                 URL: https://issues.apache.org/jira/browse/HIVE-3149
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Kevin Wilfong
            Assignee: Kevin Wilfong
            Priority: Critical
             Fix For: 0.10.0


When creating partitions in a table using dynamic partitions and a Block Merge Task is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.

I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.

E.g.
insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';

In this query, if a Block Merge Task is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3149:
--------------------------------

    Attachment: HIVE-3149.1.patch.txt
    
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Critical
>             Fix For: 0.10.0
>
>         Attachments: HIVE-3149.1.patch.txt
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3149:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed. Thanks Kevin
                
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Blocker
>             Fix For: 0.10.0
>
>         Attachments: HIVE-3149.1.patch.txt
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3149:
--------------------------------

    Description: 
When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.

I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.

E.g.
insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';

In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

  was:
When creating partitions in a table using dynamic partitions and a Block Merge Task is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.

I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.

E.g.
insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';

In this query, if a Block Merge Task is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

        Summary: Dynamically generated paritions deleted by Block level merge  (was: Dynamically generated paritions deleted by BlockMergeTask)
    
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Critical
>             Fix For: 0.10.0
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated HIVE-3149:
----------------------------------

    Priority: Blocker  (was: Critical)
    
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Blocker
>             Fix For: 0.10.0
>
>         Attachments: HIVE-3149.1.patch.txt
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3149:
--------------------------------

    Status: Patch Available  (was: Open)
    
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Critical
>             Fix For: 0.10.0
>
>         Attachments: HIVE-3149.1.patch.txt
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296063#comment-13296063 ] 

Kevin Wilfong commented on HIVE-3149:
-------------------------------------

Submitted a diff here https://reviews.facebook.net/D3693
                
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Critical
>             Fix For: 0.10.0
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3149) Dynamically generated paritions deleted by Block level merge

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393392#comment-13393392 ] 

Hudson commented on HIVE-3149:
------------------------------

Integrated in Hive-trunk-h0.21 #1492 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1492/])
    HIVE-3149 Dynamically generated paritions deleted by Block level merge
(Kevin Wilfong via namit) (Revision 1350946)

     Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1350946
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* /hive/trunk/ql/src/test/queries/clientpositive/merge_dynamic_partition4.q
* /hive/trunk/ql/src/test/results/clientpositive/merge_dynamic_partition4.q.out

                
> Dynamically generated paritions deleted by Block level merge
> ------------------------------------------------------------
>
>                 Key: HIVE-3149
>                 URL: https://issues.apache.org/jira/browse/HIVE-3149
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>            Priority: Blocker
>             Fix For: 0.10.0
>
>         Attachments: HIVE-3149.1.patch.txt
>
>
> When creating partitions in a table using dynamic partitions and a Block level merge is executed at the end of the query, some partitions may be lost.  Specifically if the values of two or more dynamic partition keys end in the same sequence of numbers, all but the largest will be dropped.
> I was not able to confirm it, but I suspect that if a map reduce job is speculated as part of the merge, the duplicate data will not be deleted either.
> E.g.
> insert overwrite table merge_dynamic_part partition (ds = '2008-04-08', hr)
> select key, value, if(key % 2 == 0, 'a1', 'b1') as hr from srcpart_merge_dp_rc where ds = '2008-04-08';
> In this query, if a Block level merge is executed at the end, only one of the partitions ds=2008-04-08/hr=a1 and ds=2008-04-08/hr=b1 will appear in the final table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira