You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sohan Jain (JIRA)" <ji...@apache.org> on 2011/06/10 08:50:59 UTC

[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()

Optimize get_partition_names_ps()
---------------------------------

                 Key: HIVE-2213
                 URL: https://issues.apache.org/jira/browse/HIVE-2213
             Project: Hive
          Issue Type: Improvement
          Components: Metastore
            Reporter: Sohan Jain
            Assignee: Sohan Jain


If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2213:
-----------------------------

    Attachment: HIVE-2213.3.patch

-Fixed line that exceeded 100 chars

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050586#comment-13050586 ] 

Paul Yang commented on HIVE-2213:
---------------------------------

Looks good, but can you do a minor update to fix lines longer than 100 chars?

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050803#comment-13050803 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------

(Updated 2011-06-16 23:30:02.425588)


Review request for hive and Paul Yang.


Changes
-------

-Fixed line that exceeded 100 chars


Summary
-------

If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 

Diff: https://reviews.apache.org/r/878/diff


Testing
-------

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2213:
-----------------------------

    Status: Patch Available  (was: Open)

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051330#comment-13051330 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------

(Updated 2011-06-17 21:22:00.028428)


Review request for hive and Paul Yang.


Changes
-------

- made getPartitionPsQueryResults() return a parameterized type to avoid lots of casting


Summary
-------

If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1136751 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 

Diff: https://reviews.apache.org/r/878/diff


Testing
-------

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047041#comment-13047041 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------

Review request for hive and Paul Yang.


Summary
-------

If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


Diffs
-----

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 

Diff: https://reviews.apache.org/r/878/diff


Testing
-------

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047452#comment-13047452 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review804
-----------------------------------------------------------


You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique?


trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
<https://reviews.apache.org/r/878/#comment1753>

    Can you refactor with the above function since they are similar?



trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
<https://reviews.apache.org/r/878/#comment1754>

    Same here



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1755>

    To be consistent with the other method, maybe call this listPartitionNamesPs?



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
<https://reviews.apache.org/r/878/#comment1756>

    Combine with above


- Paul


On 2011-06-10 07:05:56, Sohan Jain wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/878/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-06-10 07:05:56)
bq.  
bq.  
bq.  Review request for hive and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.  
bq.  
bq.  This addresses bug HIVE-2213.
bq.      https://issues.apache.org/jira/browse/HIVE-2213
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 
bq.    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 
bq.  
bq.  Diff: https://reviews.apache.org/r/878/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sohan
bq.  
bq.



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2213) Optimize partial specification metastore functions

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-2213:
----------------------------

    Summary: Optimize partial specification metastore functions  (was: Optimize get_partition_names_ps())

> Optimize partial specification metastore functions
> --------------------------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050841#comment-13050841 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review858
-----------------------------------------------------------



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1877>

    Can we make this method parameterized to reduce the number of casts required? E.g.
    
    private <T> Collection <T> getPartition...
    
    We might have to do something like <String>getPartition... when making the call though.


- Paul


On 2011-06-16 23:30:02, Sohan Jain wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/878/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-06-16 23:30:02)
bq.  
bq.  
bq.  Review request for hive and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.  
bq.  
bq.  This addresses bug HIVE-2213.
bq.      https://issues.apache.org/jira/browse/HIVE-2213
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 
bq.    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 
bq.  
bq.  Diff: https://reviews.apache.org/r/878/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sohan
bq.  
bq.



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051419#comment-13051419 ] 

Paul Yang commented on HIVE-2213:
---------------------------------

If get_partitions_ps_with_auth() was not correct before, then we should fix the method to produce the correct behavior. Ideally, it should have been done in a separate JIRA, but it should be okay to include in this one.

+1 looks good though, will test and commit.

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sohan Jain updated HIVE-2213:
-----------------------------

    Attachment: HIVE-2213.1.patch

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2213) Optimize partial specification metastore functions

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054064#comment-13054064 ] 

Hudson commented on HIVE-2213:
------------------------------

Integrated in Hive-trunk-h0.21 #790 (See [https://builds.apache.org/job/Hive-trunk-h0.21/790/])
    

> Optimize partial specification metastore functions
> --------------------------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2213) Optimize partial specification metastore functions

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-2213:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0
           Status: Resolved  (was: Patch Available)

Committed. Thanks Sohan!

> Optimize partial specification metastore functions
> --------------------------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051333#comment-13051333 ] 

Sohan Jain commented on HIVE-2213:
----------------------------------

I'd also like to point one more thing out.  The previous implementation of get_partitions_ps_with_auth() did not actually make use of the inputted user name or group name, nor did it set any auth privileges on the desired partitions.  

This patch adds authentication privileges, which unfortunately slows down get_partitions_ps_with_auth(), since we have to iterate through all of the partitions and set privileges before returning them.  What is the desired behavior here?

> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048781#comment-13048781 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------

(Updated 2011-06-13 21:11:38.325243)


Review request for hive and Paul Yang.


Changes
-------

-Refactored similar functions
-Renamed getPartitionNamesPs() to listPartitionNamesPs()
-Modified get_partitions_ps() and get_partitions_ps_with_auth() for a similar optimization


Summary
-------

If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 

Diff: https://reviews.apache.org/r/878/diff


Testing
-------

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050585#comment-13050585 ] 

jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review853
-----------------------------------------------------------



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1862>

    Line exceeds 100 char limit


- Paul


On 2011-06-13 21:11:38, Sohan Jain wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/878/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-06-13 21:11:38)
bq.  
bq.  
bq.  Review request for hive and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.  
bq.  
bq.  This addresses bug HIVE-2213.
bq.      https://issues.apache.org/jira/browse/HIVE-2213
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 
bq.    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 
bq.    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 
bq.  
bq.  Diff: https://reviews.apache.org/r/878/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sohan
bq.  
bq.



> Optimize get_partition_names_ps()
> ---------------------------------
>
>                 Key: HIVE-2213
>                 URL: https://issues.apache.org/jira/browse/HIVE-2213
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>         Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database.  This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira