You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2020/04/08 06:31:00 UTC

[jira] [Updated] (HIVE-22957) Add Predicate Filtering In MSCK REPAIR TABLE

     [ https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Syed Shameerur Rahman updated HIVE-22957:
-----------------------------------------
    Attachment: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf

> Add Predicate Filtering In MSCK REPAIR TABLE
> --------------------------------------------
>
>                 Key: HIVE-22957
>                 URL: https://issues.apache.org/jira/browse/HIVE-22957
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf, HIVE-22957.01.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently MSCK command supports full repair of table (all partitions) or some subset of partitions based on partitionSpec. The aim of this jira is to introduce a filterExp (=, !=, <, >, >=, <=, LIKE) in MSCK command so that a larger subset of partitions can be recovered (added/deleted) without firing a full repair might take time if the no. of partitions are huge.
> *Approach*:
> The initial approach is to add a where clause in MSCK command Eg: MCK REPAIR TABLE <tbl_name> ADD|DROP|SYNC PARTITIONS WHERE <pcol1> <filter_operator> <value> AND ....
> *Flow:*
> 1) Parse the where clause and generate filterExpression
> 2) fetch all the partitions from the metastore which matches the filter expression
> 3) fetch all the partition file from the filesystem
> 4) remove all the partition path which does not match with the filter expression
> 5) Based on ADD | DROP | SYNC do the remaining steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)