You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/07/19 02:30:05 UTC

[GitHub] [hive] kasakrisz commented on a diff in pull request #3433: HIVE-26294: Allow substr to take bigint params

kasakrisz commented on code in PR #3433:
URL: https://github.com/apache/hive/pull/3433#discussion_r923998397


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -64,6 +65,27 @@ public UDFSubstr() {
     r = new Text();
   }
 
+  public Text evaluate(Text t, LongWritable pos, LongWritable len) {
+
+    if ((t == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    r.clear();
+    if ((len.get() <= 0)) {
+      return r;
+    }
+
+    String s = t.toString();
+    int[] index = makeIndex(pos.get(), len.get(), s.length());
+    if (index == null) {
+      return r;
+    }
+
+    r.set(s.substring(index[0], index[1]));
+    return r;
+  }

Review Comment:
   Java String accepts int parameters only
   https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
   
   How about creating a common `evaluate` which receives int parameters and both the `LongWritable` and `IntWritable` version call it?
   
   ```
   public Text evaluate(Text t, IntWritable pos, IntWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      return evaluate(t, pos.get(), len.get());
   }
     
   public Text evaluate(Text t, LongWritable pos, LongWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      // Handle long int overflow
      return evaluate(t, (int)pos.get(), (int)len.get());
   }
   
   private Text evaluate(Text t, int pos, int len) {
   ...
   }
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable len) {
     return r;
   }
 
+  private int[] makeIndex(long pos, long len, int inputLen) {
+    if ((Math.abs(pos) > inputLen)) {
+      return null;
+    }
+
+    long start, end;
+
+    if (pos > 0) {
+      start = pos - 1;
+    } else if (pos < 0) {
+      start = inputLen + pos;
+    } else {
+      start = 0;
+    }
+
+    if ((inputLen - start) < len) {
+      end = inputLen;
+    } else {
+      end = start + len;
+    }
+    index[0] = (int) start;
+    index[1] = (int) end;
+    return index;
+  }

Review Comment:
   This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call the common  one. See my previous comment.



##########
ql/src/test/queries/clientpositive/udf_substr.q:
##########
@@ -76,3 +76,8 @@ SELECT
   substr("abc 玩玩玩 abc", 5),
   substr("abc 玩玩玩 abc", 5, 3)
 FROM src tablesample (1 rows);
+
+SELECT
+  substr('ABC', cast(1 as bigint), cast(0 as bigint)),
+  substr('ABC', cast(4 as bigint))
+FROM src tablesample (1 rows);

Review Comment:
   Could you please add a test when one of the long params has a value which doesn't falls into the int range? How should we handle that?



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) {
 
   private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE);
 
+  private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE);
+
   public Text evaluate(Text s, IntWritable pos) {
     return evaluate(s, pos, maxValue);
   }
 
+  public Text evaluate(Text s, LongWritable pos) {
+    return evaluate(s, pos, maxLongValue);
+  }
+
+  public BytesWritable evaluate(BytesWritable bw, LongWritable pos, LongWritable len) {
+
+    if ((bw == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    if ((len.get() <= 0)) {
+      return new BytesWritable();
+    }
+
+    int[] index = makeIndex(pos.get(), len.get(), bw.getLength());
+    if (index == null) {
+      return new BytesWritable();
+    }
+
+    return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0], index[1]));
+  }

Review Comment:
   The pattern for the `Text` version of the `evaluate` can be applied here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org