You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/12/26 12:30:56 UTC

[GitHub] [drill] Leon-WTF opened a new pull request #2416: Support reverse truncation for split_part udf

Leon-WTF opened a new pull request #2416:
URL: https://github.com/apache/drill/pull/2416


   # [DRILL-8094](https://issues.apache.org/jira/browse/DRILL-8094): Support reverse truncation for split_part udf
   
   ## Description
   Suport split_part('a,b,c,d', ',' , -2, -1) = 'c,d' and split_part('a,b,c,d', ',' , -3) = 'b'
   
   ## Documentation
   
   ## Testing
   UT added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton merged pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton merged pull request #2416:
URL: https://github.com/apache/drill/pull/2416


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] Leon-WTF commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
Leon-WTF commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r780648058



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {

Review comment:
       > Sigh, 1-based indexing. I think the logic you chose in this PR for non-positive index values is the best we can do, considering.
   @jnturton Hi James, anything wrong here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] Leon-WTF commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
Leon-WTF commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r780648058



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {

Review comment:
       > Sigh, 1-based indexing. I think the logic you chose in this PR for non-positive index values is the best we can do, considering.
   @jnturton Hi James, anything wrong here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2416:
URL: https://github.com/apache/drill/pull/2416#issuecomment-1014685897


   Hi @Leon-WTF, sorry about the long delay here.  I wanted to try to remove `if` statements but hadn't noticed that lazy splitting is possible for postive index values making the cases more different than I'd realised.  I came up with an alternative implementation with the negative index case based on reversing the original string, and the delimiter, then doing lazy splitting in the forward direction and then reversing the selected part for the answer but in the end I think it was worse than what's here.
   
   I'll approve shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] Leon-WTF commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
Leon-WTF commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r776938643



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       > What happens when arrayIndex < 0 here? An informative error would be better than uninitialised result data
   
   arrayIndex < 0 is not an error, it will return empty string, see the case below in UT.
       testBuilder()
         .sqlQuery("select split_part('a,b,c', ',', -4) res1 from (values(1))")
         .ordered()
         .baselineColumns("res1")
         .baselineValues("")
         .go();
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r776967381



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       For performance we try to avoid branching inside function `eval` methods whenever possible.  Can any of these `if` statements be removed?  E.g. by using modular arithmetic to calculate the wrapped array index.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       Okay that does actually appear to be consistent with what happens when arrayIndex is past the end of the array.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       Perhaps in the context of this sort of string processing, branching penalities are not very significant 🤔




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] Leon-WTF commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
Leon-WTF commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r788824571



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       yes, you are right, at least the check of input index needs to be moved to setup function. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r776661535



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       What happens when arrayIndex < 0 here?  An informative error would be better than uninitialised result data




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on pull request #2416:
URL: https://github.com/apache/drill/pull/2416#issuecomment-1017579431


   @Leon-WTF if there is a chance to move any logic to setup() just send another PR reusing this existing Jira DRILL-8094.  But if index is allowed to be a variable expression and not just a constant then maybe nothing can move to setup()...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2416:
URL: https://github.com/apache/drill/pull/2416#issuecomment-1014689249


   P.S. Had Google implemented a lazy Splitter that operates backwards from the end of the string there might have been a nice simplification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] Leon-WTF commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
Leon-WTF commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r776938643



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       > What happens when arrayIndex < 0 here? An informative error would be better than uninitialised result data
   
   I think arrayIndex < 0 is not an error, it will return the empty string, see the case below in UT.
       testBuilder()
         .sqlQuery("select split_part('a,b,c', ',', -4) res1 from (values(1))")
         .ordered()
         .baselineColumns("res1")
         .baselineValues("")
         .go();
   Same as:
       testBuilder()
           .sqlQuery("select split_part('a,b,c', ',', 4) res1 from (values(1))")
           .ordered()
           .baselineColumns("res1")
           .baselineValues("")
           .go();




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2416:
URL: https://github.com/apache/drill/pull/2416#discussion_r776661535



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {
         throw org.apache.drill.common.exceptions.UserException.functionError()
-          .message("Index in split_part must be positive, value provided was "
-            + index.value).build();
+          .message("Index in split_part can not be zero").build();
       }
       String inputString = org.apache.drill.exec.expr.fn.impl.
         StringFunctionHelpers.getStringFromVarCharHolder(in);
-      int arrayIndex = index.value - 1;
-      String result =
-              (String) com.google.common.collect.Iterables.get(splitter.split(inputString), arrayIndex, "");
+      String result = "";
+      if (index.value < 0) {
+        java.util.List<String> splits = splitter.splitToList(inputString);
+        int size = splits.size();
+        int arrayIndex = size + index.value;
+        if (arrayIndex >= 0) {

Review comment:
       What happens when arrayIndex < 0 here?  An informative error would be better than unitialised result data

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
##########
@@ -416,16 +417,25 @@ public void setup() {
 
     @Override
     public void eval() {
-      if (index.value < 1) {
+      if (index.value == 0) {

Review comment:
       Sigh, 1-based indexing.  I think the logic you chose in this PR for non-positive index values is the best we can do, considering.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2416: DRILL-8094: Support reverse truncation for split_part udf

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2416:
URL: https://github.com/apache/drill/pull/2416#issuecomment-1017579431


   @Leon-WTF if there is a chance to move any logic to setup() just send another PR reusing this existing Jira DRILL-8094.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org