You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sohan Jain (JIRA)" <ji...@apache.org> on 2011/06/14 20:40:48 UTC

[jira] [Created] (HIVE-2219) Make "alter table drop partition" more efficient

Make "alter table drop partition" more efficient
------------------------------------------------

                 Key: HIVE-2219
                 URL: https://issues.apache.org/jira/browse/HIVE-2219
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Sohan Jain


The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062179#comment-13062179 ] 

Paul Yang commented on HIVE-2219:
---------------------------------

I likely mixed up the RB and JIRA versions - looking at HIVE-2275 now.

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063600#comment-13063600 ] 

Hudson commented on HIVE-2219:
------------------------------

Integrated in Hive-trunk-h0.21 #821 (See [https://builds.apache.org/job/Hive-trunk-h0.21/821/])
    HIVE-2275. Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1145368
Files : 
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java


> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain reassigned HIVE-2219:
--------------------------------

    Assignee: Sohan Jain

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2219:
-----------------------------

    Attachment: HIVE-2219.1.patch

Improves the time it takes to check whether a partition to delete exists in the lists of partitions.  Overall improves the complexity to _O(m + n)_

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2219:
-----------------------------

    Attachment: HIVE-2219.2.patch

This patch is the correct one from ReviewBoard

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056180#comment-13056180 ] 

Paul Yang commented on HIVE-2219:
---------------------------------

+1 Will test and commit.

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2219:
-----------------------------

    Status: Patch Available  (was: Open)

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060941#comment-13060941 ] 

Hudson commented on HIVE-2219:
------------------------------

Integrated in Hive-trunk-h0.21 #813 (See [https://builds.apache.org/job/Hive-trunk-h0.21/813/])
    HIVE-2219. Make "alter table drop partition" more efficient (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1143508
Files : 
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java


> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062132#comment-13062132 ] 

John Sichi commented on HIVE-2219:
----------------------------------

Since this patch already got committed, I would recommend opening a new JIRA issue to amend it.

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2219:
-----------------------------

    Status: Open  (was: Patch Available)

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050847#comment-13050847 ] 

Paul Yang commented on HIVE-2219:
---------------------------------

Can you make a reviewboard instance?

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang resolved HIVE-2219.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0

Committed. Thanks Sohan!

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062167#comment-13062167 ] 

Sohan Jain commented on HIVE-2219:
----------------------------------

Ok, please refer to HIVE-2275

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052941#comment-13052941 ] 

jiraposter@reviews.apache.org commented on HIVE-2219:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/941/
-----------------------------------------------------------

Review request for hive and Paul Yang.


Summary
-------

Improve the efficiency of the function that handles dropping multiple partitions by finding the partitions to drop at the JDO level instead of iterating through all given partitions and existing partitions.


This addresses bug HIVE-2219.
    https://issues.apache.org/jira/browse/HIVE-2219


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1138144 

Diff: https://reviews.apache.org/r/941/diff


Testing
-------

Still passes drop_multi_partitions.q.  Tested speed on dropping ~10k partitions in a table with ~400k partitions.  This section of code took ~10 minutes after the change, and some amount > 30 minutes before.


Thanks,

Sohan



> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2219) Make "alter table drop partition" more efficient

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050874#comment-13050874 ] 

Sohan Jain commented on HIVE-2219:
----------------------------------

Ah sorry, after another round of testing, I realized this doesn't work correctly at all for partial partition specs!  I will re-implement it and test again for speed / full correctness.

> Make "alter table drop partition" more efficient
> ------------------------------------------------
>
>                 Key: HIVE-2219
>                 URL: https://issues.apache.org/jira/browse/HIVE-2219
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2219.1.patch
>
>
> The current function dropTable() that handles dropping multiple partitions is somewhat inefficient.  For each partition you want to drop, it loops through each partition in the table to see if the partition exists.  This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table.  The running time of this function can be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira