You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ajay Kidave (JIRA)" <ji...@apache.org> on 2010/09/21 19:29:33 UTC

[jira] Created: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Change get_partitions_ps to pass partition filter to database
-------------------------------------------------------------

                 Key: HIVE-1660
                 URL: https://issues.apache.org/jira/browse/HIVE-1660
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Metastore
            Reporter: Ajay Kidave
            Assignee: Paul Yang


Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920807#action_12920807 ] 

Namit Jain commented on HIVE-1660:
----------------------------------

I will take a look

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Attachment: HIVE-1660.2.patch

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Status: Patch Available  (was: Open)

Full suite passed on my end

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Affects Version/s: 0.7.0
               Status: Patch Available  (was: Open)

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921063#action_12921063 ] 

Namit Jain commented on HIVE-1660:
----------------------------------

The following tests failed:

bucket3.q
stats10.q
stats2.q
stats8.q
union22.q


reduce_deduplicate.q (in minimr)


input2.q, input3.q (TestParse)

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1660:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.7.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed. Thanks Paul

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1660:
-----------------------------

    Status: Open  (was: Patch Available)

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Attachment: HIVE-1660_regex.patch

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Attachment: HIVE-1660.4.patch

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Attachment: HIVE-1660.1.patch

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922405#action_12922405 ] 

Paul Yang commented on HIVE-1660:
---------------------------------

Also, HIVE-1660.4.patch is the refreshed version.

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660.4.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921634#action_12921634 ] 

Namit Jain commented on HIVE-1660:
----------------------------------

Paul, the patch does not apply cleanly. Can you refresh ?

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1660:
----------------------------

    Attachment: HIVE-1660.3.patch

* Fixed causes of test failures
* Fixed bugs with the filter optimization
* Added additional test cases

I'll submit the patch once the full test suite completes.

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.7.0
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660.2.patch, HIVE-1660.3.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797 ] 

Paul Yang commented on HIVE-1660:
---------------------------------

HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() method and fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition, the patch makes an optimization to use a partition name regex for filtering in cases of equality comparisons.

HIVE-1660_regex.patch was a little experiment to test out the potential speed up from filtering based on a more complete regex of the partition name. For example, for a table partitioned on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch. Since the table with 5 million partitions was a very unusual case, I didn't think the speedup was worth the additional complexity.

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement  for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.