You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sohan Jain (JIRA)" <ji...@apache.org> on 2011/06/10 08:50:59 UTC
[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()
Optimize get_partition_names_ps()
---------------------------------
Key: HIVE-2213
URL: https://issues.apache.org/jira/browse/HIVE-2213
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sohan Jain updated HIVE-2213:
-----------------------------
Attachment: HIVE-2213.3.patch
-Fixed line that exceeded 100 chars
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050586#comment-13050586 ]
Paul Yang commented on HIVE-2213:
---------------------------------
Looks good, but can you do a minor update to fix lines longer than 100 chars?
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050803#comment-13050803 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------
(Updated 2011-06-16 23:30:02.425588)
Review request for hive and Paul Yang.
Changes
-------
-Fixed line that exceeded 100 chars
Summary
-------
If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
Diff: https://reviews.apache.org/r/878/diff
Testing
-------
Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
Thanks,
Sohan
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sohan Jain updated HIVE-2213:
-----------------------------
Status: Patch Available (was: Open)
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051330#comment-13051330 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------
(Updated 2011-06-17 21:22:00.028428)
Review request for hive and Paul Yang.
Changes
-------
- made getPartitionPsQueryResults() return a parameterized type to avoid lots of casting
Summary
-------
If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1136751
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1136751
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751
Diff: https://reviews.apache.org/r/878/diff
Testing
-------
Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
Thanks,
Sohan
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047041#comment-13047041 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------
Review request for hive and Paul Yang.
Summary
-------
If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213
Diffs
-----
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205
Diff: https://reviews.apache.org/r/878/diff
Testing
-------
Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
Thanks,
Sohan
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047452#comment-13047452 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review804
-----------------------------------------------------------
You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique?
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
<https://reviews.apache.org/r/878/#comment1753>
Can you refactor with the above function since they are similar?
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
<https://reviews.apache.org/r/878/#comment1754>
Same here
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1755>
To be consistent with the other method, maybe call this listPartitionNamesPs?
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
<https://reviews.apache.org/r/878/#comment1756>
Combine with above
- Paul
On 2011-06-10 07:05:56, Sohan Jain wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/878/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-06-10 07:05:56)
bq.
bq.
bq. Review request for hive and Paul Yang.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.
bq.
bq. This addresses bug HIVE-2213.
bq. https://issues.apache.org/jira/browse/HIVE-2213
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205
bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205
bq.
bq. Diff: https://reviews.apache.org/r/878/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.
bq.
bq. Thanks,
bq.
bq. Sohan
bq.
bq.
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize partial specification
metastore functions
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-2213:
----------------------------
Summary: Optimize partial specification metastore functions (was: Optimize get_partition_names_ps())
> Optimize partial specification metastore functions
> --------------------------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050841#comment-13050841 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review858
-----------------------------------------------------------
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1877>
Can we make this method parameterized to reduce the number of casts required? E.g.
private <T> Collection <T> getPartition...
We might have to do something like <String>getPartition... when making the call though.
- Paul
On 2011-06-16 23:30:02, Sohan Jain wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/878/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-06-16 23:30:02)
bq.
bq.
bq. Review request for hive and Paul Yang.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.
bq.
bq. This addresses bug HIVE-2213.
bq. https://issues.apache.org/jira/browse/HIVE-2213
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
bq.
bq. Diff: https://reviews.apache.org/r/878/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.
bq.
bq. Thanks,
bq.
bq. Sohan
bq.
bq.
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051419#comment-13051419 ]
Paul Yang commented on HIVE-2213:
---------------------------------
If get_partitions_ps_with_auth() was not correct before, then we should fix the method to produce the correct behavior. Ideally, it should have been done in a separate JIRA, but it should be okay to include in this one.
+1 looks good though, will test and commit.
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sohan Jain updated HIVE-2213:
-----------------------------
Attachment: HIVE-2213.1.patch
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize partial specification
metastore functions
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054064#comment-13054064 ]
Hudson commented on HIVE-2213:
------------------------------
Integrated in Hive-trunk-h0.21 #790 (See [https://builds.apache.org/job/Hive-trunk-h0.21/790/])
> Optimize partial specification metastore functions
> --------------------------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Fix For: 0.8.0
>
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize partial specification
metastore functions
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-2213:
----------------------------
Resolution: Fixed
Fix Version/s: 0.8.0
Status: Resolved (was: Patch Available)
Committed. Thanks Sohan!
> Optimize partial specification metastore functions
> --------------------------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Fix For: 0.8.0
>
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "Sohan Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051333#comment-13051333 ]
Sohan Jain commented on HIVE-2213:
----------------------------------
I'd also like to point one more thing out. The previous implementation of get_partitions_ps_with_auth() did not actually make use of the inputted user name or group name, nor did it set any auth privileges on the desired partitions.
This patch adds authentication privileges, which unfortunately slows down get_partitions_ps_with_auth(), since we have to iterate through all of the partitions and set privileges before returning them. What is the desired behavior here?
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048781#comment-13048781 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
-----------------------------------------------------------
(Updated 2011-06-13 21:11:38.325243)
Review request for hive and Paul Yang.
Changes
-------
-Refactored similar functions
-Renamed getPartitionNamesPs() to listPartitionNamesPs()
-Modified get_partitions_ps() and get_partitions_ps_with_auth() for a similar optimization
Summary
-------
If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
Diff: https://reviews.apache.org/r/878/diff
Testing
-------
Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
Thanks,
Sohan
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050585#comment-13050585 ]
jiraposter@reviews.apache.org commented on HIVE-2213:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review853
-----------------------------------------------------------
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/878/#comment1862>
Line exceeds 100 char limit
- Paul
On 2011-06-13 21:11:38, Sohan Jain wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/878/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-06-13 21:11:38)
bq.
bq.
bq. Review request for hive and Paul Yang.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
bq.
bq.
bq. This addresses bug HIVE-2213.
bq. https://issues.apache.org/jira/browse/HIVE-2213
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
bq.
bq. Diff: https://reviews.apache.org/r/878/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
bq.
bq.
bq. Thanks,
bq.
bq. Sohan
bq.
bq.
> Optimize get_partition_names_ps()
> ---------------------------------
>
> Key: HIVE-2213
> URL: https://issues.apache.org/jira/browse/HIVE-2213
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Sohan Jain
> Assignee: Sohan Jain
> Attachments: HIVE-2213.1.patch
>
>
> If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira