You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ajay Kidave (JIRA)" <ji...@apache.org> on 2010/09/21 19:29:33 UTC
[jira] Created: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Change get_partitions_ps to pass partition filter to database
-------------------------------------------------------------
Key: HIVE-1660
URL: https://issues.apache.org/jira/browse/HIVE-1660
Project: Hadoop Hive
Issue Type: Improvement
Components: Metastore
Reporter: Ajay Kidave
Assignee: Paul Yang
Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920807#action_12920807 ]
Namit Jain commented on HIVE-1660:
----------------------------------
I will take a look
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Attachment: HIVE-1660.2.patch
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Status: Patch Available (was: Open)
Full suite passed on my end
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Affects Version/s: 0.7.0
Status: Patch Available (was: Open)
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921063#action_12921063 ]
Namit Jain commented on HIVE-1660:
----------------------------------
The following tests failed:
bucket3.q
stats10.q
stats2.q
stats8.q
union22.q
reduce_deduplicate.q (in minimr)
input2.q, input3.q (TestParse)
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1660:
-----------------------------
Resolution: Fixed
Fix Version/s: 0.7.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Paul
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1660:
-----------------------------
Status: Open (was: Patch Available)
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Attachment: HIVE-1660_regex.patch
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Attachment: HIVE-1660.4.patch
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Attachment: HIVE-1660.1.patch
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922405#action_12922405 ]
Paul Yang commented on HIVE-1660:
---------------------------------
Also, HIVE-1660.4.patch is the refreshed version.
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921634#action_12921634 ]
Namit Jain commented on HIVE-1660:
----------------------------------
Paul, the patch does not apply cleanly. Can you refresh ?
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Yang updated HIVE-1660:
----------------------------
Attachment: HIVE-1660.3.patch
* Fixed causes of test failures
* Fixed bugs with the filter optimization
* Added additional test cases
I'll submit the patch once the full test suite completes.
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.7.0
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass
partition filter to database
Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797 ]
Paul Yang commented on HIVE-1660:
---------------------------------
HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() method and fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition, the patch makes an optimization to use a partition name regex for filtering in cases of equality comparisons.
HIVE-1660_regex.patch was a little experiment to test out the potential speed up from filtering based on a more complete regex of the partition name. For example, for a table partitioned on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch. Since the table with 5 million partitions was a very unusual case, I didn't think the speedup was worth the additional complexity.
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.