You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "okumin (via GitHub)" <gi...@apache.org> on 2023/02/21 12:46:42 UTC

[GitHub] [datasketches-hive] okumin opened a new pull request, #66: Add @Description to UDFs registered in Apache Hive

okumin opened a new pull request, #66:
URL: https://github.com/apache/datasketches-hive/pull/66

   I observe many WARN logs when using Hive 4.
   
   ```
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,535  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.cpc.UnionSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,538  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.hll.UnionSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,540  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.theta.IntersectSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,541  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.theta.EstimateSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,541  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.theta.ExcludeSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,542  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.theta.UnionSketchUDF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   pod/hive-hiveserver2-84dbfbc99c-d7sll: 2023-02-19T11:05:51,542  WARN [main] exec.FunctionRegistry: UDF Class org.apache.hive.org.apache.datasketches.hive.tuple.ArrayOfDoublesSketchToValuesUDTF does not have description. Please annotate the class with the org.apache.hadoop.hive.ql.exec.Description annotation and provide the description of the function.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] okumin commented on a diff in pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "okumin (via GitHub)" <gi...@apache.org>.
okumin commented on code in PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66#discussion_r1114195982


##########
src/main/java/org/apache/datasketches/hive/cpc/UnionSketchUDF.java:
##########
@@ -24,12 +24,22 @@
 import org.apache.datasketches.cpc.CpcSketch;
 import org.apache.datasketches.cpc.CpcUnion;
 import org.apache.datasketches.hive.common.BytesWritableHelper;
+import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.BytesWritable;
 
 /**
  * Hive union sketch UDF.
  */
+@Description(
+    name = "unionSketch",
+    value = "_FUNC_(firstSketch, secondSketch[, lgK[, seed]]) - Compute the union of the given "
+        + "sketches with the given size and seed",
+    extended = "The return value is a binary blob that can be operated on by other sketch related functions."
+        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."

Review Comment:
   Thanks. I fixed all typos. Please let me know if you prefer to have separate PRs.
   https://github.com/apache/datasketches-hive/pull/66/commits/e3bafbe37aecd3566369676299cf1c5437374d54
   
   ```
   $ git grep rlative
   src/main/java/org/apache/datasketches/hive/cpc/DataToSketchUDAF.java:    + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   src/main/java/org/apache/datasketches/hive/cpc/UnionSketchUDAF.java:    + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   src/main/java/org/apache/datasketches/hive/cpc/UnionSketchUDF.java:        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   src/main/java/org/apache/datasketches/hive/hll/DataToSketchUDAF.java:    + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   src/main/java/org/apache/datasketches/hive/hll/UnionSketchUDAF.java:    + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   src/main/java/org/apache/datasketches/hive/hll/UnionSketchUDF.java:        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] AlexanderSaydakov commented on a diff in pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66#discussion_r1113422603


##########
src/main/java/org/apache/datasketches/hive/cpc/UnionSketchUDF.java:
##########
@@ -24,12 +24,22 @@
 import org.apache.datasketches.cpc.CpcSketch;
 import org.apache.datasketches.cpc.CpcUnion;
 import org.apache.datasketches.hive.common.BytesWritableHelper;
+import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.BytesWritable;
 
 /**
  * Hive union sketch UDF.
  */
+@Description(
+    name = "unionSketch",
+    value = "_FUNC_(firstSketch, secondSketch[, lgK[, seed]]) - Compute the union of the given "
+        + "sketches with the given size and seed",
+    extended = "The return value is a binary blob that can be operated on by other sketch related functions."
+        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."

Review Comment:
   typo "rlative"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] AlexanderSaydakov merged pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov merged PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] AlexanderSaydakov commented on a diff in pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66#discussion_r1113423325


##########
src/main/java/org/apache/datasketches/hive/hll/UnionSketchUDF.java:
##########
@@ -23,12 +23,23 @@
 import org.apache.datasketches.hll.HllSketch;
 import org.apache.datasketches.hll.TgtHllType;
 import org.apache.datasketches.hll.Union;
+import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.BytesWritable;
 
 /**
  * Hive union sketch UDF.
  */
+@Description(
+    name = "unionSketch",
+    value = "_FUNC_(firstSketch, secondSketch[, lgK[, type]]) - Compute the union of the given "
+        + "sketches with the given size and seed",
+    extended = "The return value is a binary blob that can be operated on by other sketch related functions."
+        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."

Review Comment:
   same typo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] AlexanderSaydakov commented on a diff in pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov commented on code in PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66#discussion_r1113448108


##########
src/main/java/org/apache/datasketches/hive/cpc/UnionSketchUDF.java:
##########
@@ -24,12 +24,22 @@
 import org.apache.datasketches.cpc.CpcSketch;
 import org.apache.datasketches.cpc.CpcUnion;
 import org.apache.datasketches.hive.common.BytesWritableHelper;
+import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.BytesWritable;
 
 /**
  * Hive union sketch UDF.
  */
+@Description(
+    name = "unionSketch",
+    value = "_FUNC_(firstSketch, secondSketch[, lgK[, seed]]) - Compute the union of the given "
+        + "sketches with the given size and seed",
+    extended = "The return value is a binary blob that can be operated on by other sketch related functions."
+        + " The lgK parameter controls the sketch size and rlative error expected from the sketch."
+        + " It is optional an must be from 4 to 26. The default is 11, which is expected to yield errors"
+        + " of roughly +-1.5% in the estimation of uniques with 95% confidence."

Review Comment:
   where did this claim come from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-hive] okumin commented on pull request #66: Add @Description to UDFs registered in Apache Hive

Posted by "okumin (via GitHub)" <gi...@apache.org>.
okumin commented on PR #66:
URL: https://github.com/apache/datasketches-hive/pull/66#issuecomment-1441149927

   Thanks for your review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org