You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/04/08 06:30:00 UTC

[jira] [Work logged] (HIVE-22957) Add Predicate Filtering In MSCK REPAIR TABLE

     [ https://issues.apache.org/jira/browse/HIVE-22957?focusedWorklogId=418279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418279 ]

ASF GitHub Bot logged work on HIVE-22957:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Apr/20 06:29
            Start Date: 08/Apr/20 06:29
    Worklog Time Spent: 10m 
      Work Description: shameersss1 commented on pull request #959: HIVE-22957: Support For Filter Expression In MSCK Command
URL: https://github.com/apache/hive/pull/959#discussion_r405286793
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
 ##########
 @@ -80,4 +108,48 @@ private boolean isMsckAddPartition(int type) {
   private boolean isMsckDropPartition(int type) {
     return type == HiveParser.KW_SYNC || type == HiveParser.KW_DROP;
   }
+
+  private String getPartitionFilter(CommonTree ast, Map<String, String> partColNameTypeMap) throws SemanticException {
+    for (int childIndex = 0; childIndex < ast.getChildCount(); childIndex++) {
+      ASTNode partSpecNode = (ASTNode) ast.getChild(childIndex);
+      if (partSpecNode.getType() == HiveParser.TOK_WHERE) {
+        return constructPartitionFilterFromAST((ASTNode) partSpecNode.getChild(0), "", partColNameTypeMap);
+      }
+    }
+    return null;
+  }
+
+  private String constructPartitionFilterFromAST(ASTNode node, String filterStr,
+                                                 Map<String, String> partColNameTypeMap) throws SemanticException {
+    String operator = node.getText().toLowerCase();
+    if (HiveMetaStoreChecker.Operator.isValidOperator(operator)) {
+      String partKey = node.getChild(0).getChild(0).getText();
+
+      // Sanity Check
+      if (partColNameTypeMap.get(partKey) == null) {
+        throw new SemanticException("Filter predicate on non-partitioned column: " + partKey + " not supported");
+      }
+      //todo: Add Support For Other Data Types
+      if (FilterType.fromType(partColNameTypeMap.get(partKey)) == FilterType.Invalid) {
+        throw new SemanticException("Filter predicate only supported for string data types");
+      }
+
+      if (node.getChild(1).getChild(0) != null) {
+        throw new SemanticException("Unsupported filter predicate");
+      }
+
+      filterStr = String.join(" ", filterStr, partKey, operator,
+          node.getChild(1).getText());
+      return filterStr;
+    }
+
+    for (int i = 0; i < node.getChildCount(); i++) {
 
 Review comment:
   > what's this? the grammar accepts a generic whereclause...the `isValidOperator` function accepts a "few" of them...and all the others are handled by this `for` loop?
   > why did you choose to use the `whereClause`?
   > how would the current code handle `a=b`? I guess it would interpret it as `a='b'`
   > I'm not sure but calling a method like `ExprNodeTypeCheck.genExprNode`; could probably make your life easier - this code is actually doing the parsing from the ground up....if you are already using `whereClause` you could try to benefit from it a little...
   > 
   > so right now I suspect the condition `a=1 or b=2` or `COALESCE(a=1,b=2)` will be handled as if it would be `a=1 and b=2` ?
   > 
   > all the conditionals are limited to have the column on the left side - this limitation will not be obvious from the exceptions
   > 
   > concatenating the `filterStr` inside this method is a bit wierd...
   
   Yes it makes sense! Going with whereClause makes things difficult. I should have used something similar to partitionSpec?. 
   I have uploaded a design doc in the jira, Meanwhile i am working on the changes can you please review the design doc? Early feedbacks will be really helpful before doing the actual code.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 418279)
    Time Spent: 0.5h  (was: 20m)

> Add Predicate Filtering In MSCK REPAIR TABLE
> --------------------------------------------
>
>                 Key: HIVE-22957
>                 URL: https://issues.apache.org/jira/browse/HIVE-22957
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-22957.01.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently MSCK command supports full repair of table (all partitions) or some subset of partitions based on partitionSpec. The aim of this jira is to introduce a filterExp (=, !=, <, >, >=, <=, LIKE) in MSCK command so that a larger subset of partitions can be recovered (added/deleted) without firing a full repair might take time if the no. of partitions are huge.
> *Approach*:
> The initial approach is to add a where clause in MSCK command Eg: MCK REPAIR TABLE <tbl_name> ADD|DROP|SYNC PARTITIONS WHERE <pcol1> <filter_operator> <value> AND ....
> *Flow:*
> 1) Parse the where clause and generate filterExpression
> 2) fetch all the partitions from the metastore which matches the filter expression
> 3) fetch all the partition file from the filesystem
> 4) remove all the partition path which does not match with the filter expression
> 5) Based on ADD | DROP | SYNC do the remaining steps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)