You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sohan Jain (JIRA)" <ji...@apache.org> on 2011/08/11 01:30:36 UTC

[jira] [Created] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

Determining whether a Column Descriptor is unused may take too long
-------------------------------------------------------------------

                 Key: HIVE-2368
                 URL: https://issues.apache.org/jira/browse/HIVE-2368
             Project: Hive
          Issue Type: Bug
          Components: Metastore
            Reporter: Sohan Jain
         Attachments: HIVE-2368.1.patch

To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.

We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135623#comment-13135623 ] 

Hudson commented on HIVE-2368:
------------------------------

Integrated in Hive-trunk-h0.21 #1034 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1034/])
    HIVE-2368. Slow dropping of partitions caused by full listing of storage descriptors (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188886
Files : 
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java

                
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082765#comment-13082765 ] 

jiraposter@reviews.apache.org commented on HIVE-2368:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1462/
-----------------------------------------------------------

Review request for hive and Paul Yang.


Summary
-------

To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions.

We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.


This addresses bug HIVE-2368.
    https://issues.apache.org/jira/browse/HIVE-2368


Diffs
-----

  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1156401 

Diff: https://reviews.apache.org/r/1462/diff


Testing
-------


Thanks,

Sohan



> Determining whether a Column Descriptor is unused may take too long
> -------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135387#comment-13135387 ] 

Paul Yang commented on HIVE-2368:
---------------------------------

Committed. Thanks Sohan!
                
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

Posted by "Paul Yang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132215#comment-13132215 ] 

Paul Yang commented on HIVE-2368:
---------------------------------

The patch was good, will test and commit.
                
> Determining whether a Column Descriptor is unused may take too long
> -------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang reassigned HIVE-2368:
-------------------------------

    Assignee: Sohan Jain
    
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135386#comment-13135386 ] 

Paul Yang commented on HIVE-2368:
---------------------------------

+1
                
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138065#comment-13138065 ] 

Paul Yang commented on HIVE-2368:
---------------------------------

Backporting this to branch-0.8 as well.
                
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

Posted by "Ning Zhang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132188#comment-13132188 ] 

Ning Zhang commented on HIVE-2368:
----------------------------------

@Paul Yang, this looks good to me. Can you also review the patch? We'll need this soon. 
                
> Determining whether a Column Descriptor is unused may take too long
> -------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang resolved HIVE-2368.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0
    
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2368:
-----------------------------

    Attachment: HIVE-2368.1.patch

> Determining whether a Column Descriptor is unused may take too long
> -------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Carl Steinbach (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2368:
---------------------------------

    Fix Version/s:     (was: 0.9.0)
                   0.8.0
    
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Paul Yang (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-2368:
----------------------------

    Summary: Slow dropping of partitions caused by full listing of storage descriptors  (was: Determining whether a Column Descriptor is unused may take too long)
    
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147389#comment-13147389 ] 

Hudson commented on HIVE-2368:
------------------------------

Integrated in Hive-0.8.0-SNAPSHOT-h0.21 #89 (See [https://builds.apache.org/job/Hive-0.8.0-SNAPSHOT-h0.21/89/])
    HIVE-2368. Slow dropping of partitions caused by full listing of storage descriptors (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199915
Files : 
* /hive/branches/branch-0.8/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java

                
> Slow dropping of partitions caused by full listing of storage descriptors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-2368
>                 URL: https://issues.apache.org/jira/browse/HIVE-2368
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2368.1.patch
>
>
> To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can severely slow down dropping partitions.
> We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira