You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/04/07 11:00:22 UTC

[GitHub] [hive] ayushtkn opened a new pull request, #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

ayushtkn opened a new pull request, #3187:
URL: https://github.com/apache/hive/pull/3187

   HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845927749


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B
     case "string":
       return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast);
     default:
+      if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   can't do that. the entries aren't fixed char or varchar, they are like char(10) or char(5) or varchar(5) or varchar(6) like that. So putting char or varchar in switch-case won't work



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r853870852


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B
     case "string":
       return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast);
     default:
+      if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   Done. As discussed I pulled all of them together to avoid multiple branches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r853870053


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -1214,6 +1223,55 @@ public boolean isEqualPrimitive(String s1, String s2) {
   }
 }
 
+class CharValueBoundaryScanner extends SingleValueBoundaryScanner {
+  public CharValueBoundaryScanner(BoundaryDef start, BoundaryDef end,
+      OrderExpressionDef expressionDef, boolean nullsLast) {
+    super(start, end, expressionDef, nullsLast);
+  }
+
+  @Override
+  public boolean isDistanceGreater(Object v1, Object v2, int amt) {
+    HiveChar s1 = PrimitiveObjectInspectorUtils.getHiveChar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveChar s2 = PrimitiveObjectInspectorUtils.getHiveChar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return s1 != null && s2 != null && s1.compareTo(s2) > 0;
+  }
+
+  @Override
+  public boolean isEqual(Object v1, Object v2) {
+    HiveChar s1 = PrimitiveObjectInspectorUtils.getHiveChar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveChar s2 = PrimitiveObjectInspectorUtils.getHiveChar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
+  }
+}
+
+class VarcharValueBoundaryScanner extends SingleValueBoundaryScanner {
+  public VarcharValueBoundaryScanner(BoundaryDef start, BoundaryDef end,
+      OrderExpressionDef expressionDef, boolean nullsLast) {
+    super(start, end, expressionDef, nullsLast);
+  }
+
+  @Override
+  public boolean isDistanceGreater(Object v1, Object v2, int amt) {
+    HiveVarchar s1 = PrimitiveObjectInspectorUtils.getHiveVarchar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveVarchar s2 = PrimitiveObjectInspectorUtils.getHiveVarchar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return s1 != null && s2 != null && s1.compareTo(s2) > 0;
+  }
+
+  @Override
+  public boolean isEqual(Object v1, Object v2) {

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
abstractdog commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845925509


##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -508,6 +508,7 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "with ${hive.scratch.dir.permission}."),
     REPLDIR("hive.repl.rootdir","/user/${system:user.name}/repl/",
         "HDFS root dir for all replication dumps."),
+//    HS2 IP2 DistCp hdfs://namenodePort:port/use/hive/w/table1 Ip2:/port:...table1

Review Comment:
   this is not related to the patch I guess, maybe a leftover



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
abstractdog commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845922550


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B
     case "string":
       return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast);
     default:
+      if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   putting this into default looks strange to me, why not handle similarly to decimal as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r853869839


##########
ql/src/test/queries/clientpositive/vector_ptf_bounded_start.q:
##########
@@ -3,24 +3,31 @@ set hive.vectorized.execution.enabled=true;
 set hive.vectorized.execution.ptf.enabled=true;
 set hive.fetch.task.conversion=none;
 
-CREATE TABLE vector_ptf_part_simple_text(p_mfgr string, p_name string, p_date date, p_retailprice double, rowindex int)
+CREATE TABLE vector_ptf_part_simple_text(p_mfgr string, p_name string, p_date date, p_retailprice double,
+        p_type char(1), p_varchar varchar(5), rowindex int)
         ROW FORMAT DELIMITED
-        FIELDS TERMINATED BY '\t'
+        FIELDS TERMINATED BY ','
         STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../../data/files/vector_ptf_part_simple_all_datatypes.txt' OVERWRITE INTO TABLE vector_ptf_part_simple_text;
 
+SELECT * from vector_ptf_part_simple_text;
+
 CREATE TABLE vector_ptf_part_simple_orc (p_mfgr string, p_name string, p_date date, p_timestamp timestamp, 
-p_int int, p_retailprice double, p_decimal decimal(10,4), rowindex int) stored as orc;
+p_int int, p_retailprice double, p_decimal decimal(10,4), p_type char(1), p_varchar varchar(5),rowindex int) stored

Review Comment:
   Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn merged pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
ayushtkn merged PR #3187:
URL: https://github.com/apache/hive/pull/3187


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a diff in pull request #3187: HIVE-26074: PTF Vectorization: BoundaryScanner for varchar.

Posted by GitBox <gi...@apache.org>.
abstractdog commented on code in PR #3187:
URL: https://github.com/apache/hive/pull/3187#discussion_r845932714


##########
ql/src/test/queries/clientpositive/vector_ptf_bounded_start.q:
##########
@@ -3,24 +3,31 @@ set hive.vectorized.execution.enabled=true;
 set hive.vectorized.execution.ptf.enabled=true;
 set hive.fetch.task.conversion=none;
 
-CREATE TABLE vector_ptf_part_simple_text(p_mfgr string, p_name string, p_date date, p_retailprice double, rowindex int)
+CREATE TABLE vector_ptf_part_simple_text(p_mfgr string, p_name string, p_date date, p_retailprice double,
+        p_type char(1), p_varchar varchar(5), rowindex int)
         ROW FORMAT DELIMITED
-        FIELDS TERMINATED BY '\t'
+        FIELDS TERMINATED BY ','
         STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../../data/files/vector_ptf_part_simple_all_datatypes.txt' OVERWRITE INTO TABLE vector_ptf_part_simple_text;
 
+SELECT * from vector_ptf_part_simple_text;
+
 CREATE TABLE vector_ptf_part_simple_orc (p_mfgr string, p_name string, p_date date, p_timestamp timestamp, 
-p_int int, p_retailprice double, p_decimal decimal(10,4), rowindex int) stored as orc;
+p_int int, p_retailprice double, p_decimal decimal(10,4), p_type char(1), p_varchar varchar(5),rowindex int) stored

Review Comment:
   let this be p_char instead of p_type



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -1214,6 +1223,55 @@ public boolean isEqualPrimitive(String s1, String s2) {
   }
 }
 
+class CharValueBoundaryScanner extends SingleValueBoundaryScanner {
+  public CharValueBoundaryScanner(BoundaryDef start, BoundaryDef end,
+      OrderExpressionDef expressionDef, boolean nullsLast) {
+    super(start, end, expressionDef, nullsLast);
+  }
+
+  @Override
+  public boolean isDistanceGreater(Object v1, Object v2, int amt) {
+    HiveChar s1 = PrimitiveObjectInspectorUtils.getHiveChar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveChar s2 = PrimitiveObjectInspectorUtils.getHiveChar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return s1 != null && s2 != null && s1.compareTo(s2) > 0;
+  }
+
+  @Override
+  public boolean isEqual(Object v1, Object v2) {
+    HiveChar s1 = PrimitiveObjectInspectorUtils.getHiveChar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveChar s2 = PrimitiveObjectInspectorUtils.getHiveChar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
+  }
+}
+
+class VarcharValueBoundaryScanner extends SingleValueBoundaryScanner {
+  public VarcharValueBoundaryScanner(BoundaryDef start, BoundaryDef end,
+      OrderExpressionDef expressionDef, boolean nullsLast) {
+    super(start, end, expressionDef, nullsLast);
+  }
+
+  @Override
+  public boolean isDistanceGreater(Object v1, Object v2, int amt) {
+    HiveVarchar s1 = PrimitiveObjectInspectorUtils.getHiveVarchar(v1,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    HiveVarchar s2 = PrimitiveObjectInspectorUtils.getHiveVarchar(v2,
+        (PrimitiveObjectInspector) expressionDef.getOI());
+    return s1 != null && s2 != null && s1.compareTo(s2) > 0;
+  }
+
+  @Override
+  public boolean isEqual(Object v1, Object v2) {

Review Comment:
   can you please add isEqual testcase to TestValueBoundaryScanner?



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java:
##########
@@ -768,6 +774,9 @@ public static SingleValueBoundaryScanner getBoundaryScanner(BoundaryDef start, B
     case "string":
       return new StringPrimitiveValueBoundaryScanner(start, end, exprDef, nullsLast);
     default:
+      if (typeString.startsWith("char") || typeString.startsWith("varchar")) {

Review Comment:
   the same is handled for decimal above:
   ```
       if (typeString.startsWith("decimal")){
         typeString = "decimal"; //DecimalTypeInfo.getTypeName() includes scale/precision: "decimal(10,4)"
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org