You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/04/06 00:44:31 UTC

[GitHub] [hive] sam-an-cloudera opened a new pull request #968: Hive23111

sam-an-cloudera opened a new pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968
 
 
   This is for addressing the review comments in Hive-23111. https://issues.apache.org/jira/browse/HIVE-23111

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404239311
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
+        //do we add the partition to the filtered result [year=2001/month=1]
+        if (partition.indexOf(col) == -1){
+          isMatch = false;
+          break;
+        }
+      }
+      if (isMatch){
+        partNamesSeq.add(partition);
+      }
+    }
+    partitionNames.clear();
+    partitionNames.addAll(partNamesSeq);
+    LOG.info("The returned partition list is of size: {}", partitionNames.size());
+    for(String s : partitionNames){
+      if (LOG.isDebugEnabled()) {
 
 Review comment:
   This if condition can be moved outside of the iteration

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404270527
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
 
 Review comment:
   substring match is error prone. Can we split/join and do exact match? 
   Example, year=2001 and next_year=2001 will both match for year=2001 search. Hence my concern.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sam-an-cloudera commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
sam-an-cloudera commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404256095
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
 
 Review comment:
   The partValueSet contains each column expression, and the partition is longer, e.g. partition is  "year=2001/month=1", and partValueSet is [year=2001, month=1], so each col in partValueSet is a substring of partition. Doing contains check in partValueSet won't be able to tell the same.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sam-an-cloudera commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
sam-an-cloudera commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404256320
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java
 ##########
 @@ -654,6 +656,42 @@ public void testNoPartitionRetentionForReplTarget() throws TException, Interrupt
     assertEquals(3, partitions.size());
   }
 
+  @Test
+  public void testPartitionExprFilter() throws TException, IOException {
+    String dbName = "db10";
+    String tableName = "tbl10";
+    Map<String, Column> colMap = buildAllColumns();
+    List<String> partKeys = Lists.newArrayList("state", "dt");
 
 Review comment:
   sure, will add timestamp test case as well. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sam-an-cloudera commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
sam-an-cloudera commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404253421
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
+        //do we add the partition to the filtered result [year=2001/month=1]
+        if (partition.indexOf(col) == -1){
+          isMatch = false;
+          break;
+        }
+      }
+      if (isMatch){
+        partNamesSeq.add(partition);
+      }
+    }
+    partitionNames.clear();
+    partitionNames.addAll(partNamesSeq);
+    LOG.info("The returned partition list is of size: {}", partitionNames.size());
+    for(String s : partitionNames){
+      if (LOG.isDebugEnabled()) {
 
 Review comment:
   will do

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404239072
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
 
 Review comment:
   Looks like we are doing exact match here?
   If so, should we just do contains check in partValueSet?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sam-an-cloudera commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
sam-an-cloudera commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404447903
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
 ##########
 @@ -44,6 +53,46 @@ public String convertExprToFilter(final byte[] exprBytes, final String defaultPa
   @Override
   public boolean filterPartitionsByExpr(List<FieldSchema> partColumns, byte[] expr, String
     defaultPartitionName, List<String> partitionNames) throws MetaException {
+    String partExpr = new String(expr, StandardCharsets.UTF_8);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Partition expr: {}", expr);
+    }
+    //This is to find in partitionNames all that match expr
+    //reverse of the Msck.makePartExpr
+    Set<String> partValueSet = new HashSet<>();
+    String[] parts = partExpr.split(" AND ");
+    for ( String part : parts){
+      String[] colAndValue = part.split("=");
+      String key = FileUtils.unescapePathName(colAndValue[0]);
+      //take the value inside without the single quote marks '2018-10-30' becomes 2018-10-31
+      String value = FileUtils.unescapePathName(colAndValue[1].substring(1, colAndValue[1].length()-1));
+      partValueSet.add(key+"="+value);
+    }
+
+    List<String> partNamesSeq =  new ArrayList<>();
+    for (String partition : partitionNames){
+      boolean isMatch = true;
+      for ( String col : partValueSet){
+        //list of partitions [year=2001/month=1, year=2002/month=2, year=2001/month=3]
+        //Given expr: e.g. year='2001' AND month='1'. Only when all the expressions in the expr can be found,
 
 Review comment:
   ah, I see. I will change accordingly. Thanks for catching it. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on a change in pull request #968: Hive23111

Posted by GitBox <gi...@apache.org>.
prasanthj commented on a change in pull request #968: Hive23111
URL: https://github.com/apache/hive/pull/968#discussion_r404239808
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java
 ##########
 @@ -654,6 +656,42 @@ public void testNoPartitionRetentionForReplTarget() throws TException, Interrupt
     assertEquals(3, partitions.size());
   }
 
+  @Test
+  public void testPartitionExprFilter() throws TException, IOException {
+    String dbName = "db10";
+    String tableName = "tbl10";
+    Map<String, Column> colMap = buildAllColumns();
+    List<String> partKeys = Lists.newArrayList("state", "dt");
 
 Review comment:
   can you a testcase for timestamp too? timestamp is always the problematic type in partition column. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org